I think the first time I learnt heaps was in an undergraduate course on Data Structures. Though I understood the basic idea, it was never crystal clear for me to code during a competition.
I took up the task of learning it afresh today. Here is a short gist.
A heap is a data structure that can be used to make fast (constant time) queries of the type ‘what is the maximum element in the array’, with fast updates (logarithmic time).
A heap is usually implemented as a balanced binary tree, but there are other variants like fibonacci heap, which I won’t be discussing here. In the below discussion and code, I explain how I would implement a max-heap. For min-heaps, use the exact same techniques with small changes like flipping the comparison operators.
We need the following operations when implementing a heap:
topElem()
insertElem(t)
deleteTopElem(t)
All the operations should be done in O(logn) time. Internally in a binary-heap, the root element is always atleast as big as both of its children, and this property of the heap exists recursively. (In min-heaps, root is as small, or smaller than both of its children)
It is easy to see that topElem()
can be executed in constant time, since the maximum element would be at the root. (In min-heaps the minimum element would be at the root)
Now, when we need to remove the maximum (or the minimum element in min-heaps), we need to remove the root and preserve the heap property. Removing the root is simple, but what it leaves behind is not so trivial. Either the left, or the right child would be the biggest (or the smallest, in min-heaps) element left now. So, we compare the two. The bigger (or smaller. You get the idea now, I guess) of the two is placed in the root, leaving a void at the place where it existed. So, now we need to correct either the left, or the right sub-tree of the root, until the sub-tree is empty. This operation will be executed O(logn) times, (costing O(1) time, each time). So the complexity is O(logn). Thus, we are done with deleteTopElem()
.
insertElem(t)
is even simpler, we place the new element in the lower-most level of the binary tree, where we have a place, or create a new level if we don’t. Now, it is possible that the parent of the newly inserted element has lost the heap property. Thus, if the parent of the newly inserted element is greater than it, we swap the parent with the newly inserted element, and check if the new-parent has also lost the heap property. We continue this till we reach the root. This operation will be executed O(logn) times, (costing O(1) time, each time). So the complexity is O(logn).
Here is a messy little implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
If you notice the code, I am inserting elements one by one, costing O(logn) on each insert. And hence, O(nlogn) for n elements. However, if we know the entire set of elements before hand, heap construction can be done in linear time, in a manner similar to insertElem(t)
. Hint: Recursion.
You might also want to overload the comparison operators for custom data-structures.
I used this presentation (ppt) to refresh my memory.