Heaps

Complete and quasi-complete trees

A binary tree is complete if all leaves are at the same level.

A quasi-complete binary tree of depth d is a binary tree is such that up to depth d-1 it is a perfect binary tree and all the leaves at depth d are accumulated on the left as much as possible.

Depth of (quasi-)complete trees

A complete binary tree of depth d has exactly 2^{d+1}-1 nodes.

A quasi-complete binary tree of depth d has between 2^d and 2^{d+1}-1 vertices.

Proof

In a complete tree there is 1 vertex at depth 0 (the root itself), 2 vertices at depth 1, 4 vertices at depth 2, …, 2^i vertices at depth i. So in total, a complete tree of depth d has 1+2+2^2+\cdots +2^d=2^{d+1}-1 many vertices.

For a quasi complete-tree it is the same: up to depth d-1 there are exactly 2^d-1 many vertices. To have a tree of depth d there must be at least 1 vertex at depth d. So the total number of vertices is between 2^d and 2^{d+1}-1.

In other words: the depth of a (quasi-)complete binary tree with n vertices is \lceil \log_2(n+1)\rceil, i.e \Theta(\log n).

Implementing quasi-complete trees

We could implement quasi-complete trees with dynamically allocated nodes, and three pointers per node (left, right, father)… But it is much easier and efficient to implement them with vectors!

To make the rules easier, to store a quasi-complete tree with n vertices we use a vector A of size n+ 1 and discard A[0]:

A[1] contains the root
If 2i\leq n then A[2i] contains the left child of A[i] and if 2i+ 1 \leq n then A[2i+ 1] contains the right child of A[i].

In other words, for each i\geq 2, the father of A[i] is A[\lfloor i/2\rfloor].

Number of leaves in a quasi-complete tree

A quasi-complete tree with n vertices has \lceil \frac{n}{2}\rceil leaves.

Proof

It is easier to see when the quasi-complete tree is stored in a vector. The last leaf of tree is the nthe element of the vector. Its father is in position \lfloor \frac{n}{2}\rfloor. That is, all the n-\lfloor \frac{n}{2}\rfloor many elements in the vector after the father of the last leaf must be leaves aswell, and all the elements before the one in position \lfloor \frac{n}{2}\rfloor cannot be leaves. Since n-\lfloor \frac{n}{2}\rfloor=\lceil \frac{n}{2}\rceil this concludes the proof.

Heaps

A max-heap (resp. min-heap) is a quasi-complete binary tree such that the label of any vertex is larger (resp. smaller) or equal than the label of any of its descendants.

It is easy to see that the root of a max-heap has the largest label among all the vertices in the heap.

If a max-heap is used to implement a max prioprity queue the query for a max element and its priority is trivial: we need only to examine the root of the heap.

Replace the root of the heap with the last element (the rightmost element in the last level)
Re-establish the order sinking the new root: compare the given node to its child with largest label, and they are exchanged if its label is smaller than that of its child; the process is repeated until the order is reestablished.

Add the new element as rightmost node of the last level of the heap (or as the first element of a new deeper level)
Re-establish the order floating the new added element: compare the given node to its father, and they are exchanged if its label is larger than that of its father; the process is repeated until the order is reestablished.

The following code sketches how to construct a max-heap on a vector with n elements

// Establish a max-heap order in the
// array v[1..n] of Elem’s; Elem == priorities
template <typename Elem>
void heapify(Elem v[], int n) {
for (int i = n/2; i > 0; --i)
sink(v, n, i);//sinks the element in position i to find its place in the vector
}

Clearly this takes time at least \Omega(n). For an upper bound, consider that the cost for sinking the ith element is O(\log(n/i)). So globally the cost of the algorithm to construct a max-heap is \sum_{i=1}^{\lfloor n/2\rfloor} O(\log(n/i))=O\left(\sum_{i=1}^{\lfloor n/2\rfloor} \log(n/i)\right)=O\left(\log \frac{n^{n/2}}{(n/2)!}\right) = O(n) Therefor the cost of constructing a max-heap is \Theta(n).

Since the height of a heap is \Theta(\log n), the cost of removing the maximum and the cost of insertions is \Theta(\log n).

Heapsort

The sorting algorithm Heapsort (Williams, 1964) sorts an array of n elements building a min-heap with the n elements and extracting them, one by one, from the heap.

The originally given array is used to build the heap; heapsort works in-place and only some constant auxiliary memory space is needed. Since insertions and deletions in heaps have cost O(\log n) the cost of the algorithm is O(n\log n).

For a comparison of Heap-sort with other comparison-based sorting algorithms see this page.