Binary Search Trees
A Binary Search Tree (BST) T is a binary tree where each vertex v has associated a label \mathsf{KEY}(v) and T is either empty, or T contains at least one element x at its root, and
- the left subtree T_L and the right subtree T_R are both binary search trees, and
- for all vertices y in T_L, \mathsf{KEY}(y) <\mathsf{KEY}(x), and
- for all vertices z in T_R, \mathsf{KEY}(z) >\mathsf{KEY}(x).
The height of a BST T with n vertices is \mathsf{height}(T) and it is the length of the longest path from the root of T to a leaf. It always holds that \lceil \log_2(n+1)\rceil \leq \mathsf{height}(T) \leq n\ , where the first inequality is an equality if T is a perfectly balanced tree.
Let T be a BST and let k be the key we are looking for. That is we want to decide whether there is a vertex v in T with \mathsf{KEY}(v)=k.
If T is empty: easy, k is not appearing in the tree. We need only to signal this event, e.g. we return false
.
If T is not empty, let r be the root of T.
- If k=\mathsf{KEY}(r) we are done.
- If k<\mathsf{KEY}(r), we make a recursive call on the left subtree T_L of T to continue the search because if there exists an element in T with key k it must be in T_L.
- Analogously, if k>\mathsf{KEY}(r) then the search must recursively continue in the right subtree of T.
The insertion algorithm is also very easy and we can design it by using the same reasoning as for the search algorithm: if the new key to be inserted is smaller than the key at the root insert in the left subtree; otherwise, insert into the right subtree.
Deleting an element with given key k involves two phases:
- Phase 1: Locating the node with key k (if any) in the tree
- This is just a search (see the relative tab).
- Phase 2: Removing that node
- we have three possibilities:
if the node to be removed is a leaf (i.e. both subtrees are empty) then it suffices to remove the node.
if the node x to be removed has only one non-empty subtree, it is also easy to remove the node: lift the non-empty subtree so that the father of x points to the subtree’s root instead of pointing to x.
if the node x to be removed has two non-empty subtrees say the left subtree is T_L and the right subtree is T_R.
- We find the node y with largest key in T_L.
- We remove y and
- place \mathsf{KEY}(y) where x was.
This works because y only has one non-empty (left) subtree (why?), so we can remove y as in step 2. The same would work if instead of the T_L one considers T_R and the element with smallest key in it. Experimentally, it is actually beneficial to alternate between the two options.
AVL Trees
The main drawback of BSTs is that they can be, or become, strongly unbalanced. The AVL trees were the first-ever self-balancing binary search trees. They were introduced in 1962 by Adelson-Velskii and Landis (hence the acronim AVL). AVL trees are still commonly used in applications like databases and file systems where read and write operations need to be fast and efficient.
An AVL tree is a binary search tree that is either empty or
- its left and right subtrees, say L and R respectively are both AVL trees, and
- the height of L and R differs by one at most: |\mathsf{height}(R)−\mathsf{height}(L)|\leq 1\ .
The difference of the two subtree heights indicates the node’s balance factor; by convention, we subtract the right subtree’s height from the left’s. Let \mathsf{bal}(x) be the difference between the height of the right and left subtrees of an arbitrary vertex x, then, by construction, in a AVL tree we must have \mathsf{bal}(x) \in\{−1,0,+1\}. Therefore, any node v in an AVL tree can only be of three types:
- neutral: the heights of the subtrees are equal,
- left-heavy: the left subtree has a greater height, or
- right-heavy: the right subtree is taller.
For instance, the following BST is not an AVL, because the balance factors are not in \{−1,0,+1\}.
Implementations of AVL trees usually store at each vertex additional information: its height and/or its balance (\mathsf{bal}(·)) to avoid costly computations.
This is done as in any other BST. The thing is that in an AVL, by the previous lemma, the cost of any search for a key in an AVL tree of size n is O(\log n), even in the worst case.
Insertion at first acts as its counterpart for generic BSTs, but then we need to perform extra work to maintain the balance invariant of AVLs.
Indeed, the insertion done as in a generic BST might invalidate the balance of some nodes, but only of nodes in the path between the root to the inserted vertex. (why?)
If the balance condition at some node x is not valid anymore (it can only become +2 or -2) we must do something to fix it and reestablish the invariant. This is done via left and right rotations (or combinations of them).
After the insertion of a vertex in an AVL tree, at most 2 rotations are enough to re-establish the balances on every vertex.
Let root denote the deepest vertex that is not respecting the balance condition before any rotation, and pivot its child along the path toewards the new vertex added.
A single rotation is enough to restore the invariant when an unbalanced root has a pivot that is heavy in the same direction, this might be either a right or left rotation, depending if the root was left-hevy or right-heavy.
Below, the dotted lines denote the new vertex added.
- Right rotation
- Left rotation
A double rotation occurs when an unbalanced root has a pivot that is heavy in the opposite direction. In this case, if the root was left-heavy and the pivot right-heavy, the invariant can be restored by a doing a left and then a right rotation. The case when the root was right-heavy and the pivot left-heavy is symmetric, first it has to be done a right-rotation and then a left-rotation.
- Left-right rotation
- Left-right rotation
- analogous to the case above, just symmetric.
Credits: the animations of the all the rotations are from this blog-post on understanding AVL rotations.
See the theory slides of EDA.