PolarSPARC



Bhaskar S 10/17/2020


Ever had the need to either choose the highest or the lowest valued item from a collection of items ???

The following are some of the examples pertaining to the above question:

It is in these situations, the Binary Heap data structure comes in handy.

A Binary Heap uses the Binary Tree data structure and the must satisfy the following two properties:

Let us go through an example to populate a Binary Max Heap. For our example, we will populate the values: 19, 7, 18, 12, 15, 2, 22 respectively.

To start with, the Binary Max Heap is an empty tree. The first item to add is 19. Once added, the tree would have a single root node as shown in the following illustration:

Item 19
Figure-1

The second item to add is 7. It will be added as the left child of the root node as shown in the following illustration:

Item 7
Figure-2

The third item to add is 18. It will be added as the right child of the root node as shown in the following illustration:

Item 18
Figure-3

As can be seen from the Binary Max Heap illustration above, both the relational and structural properties are maintained.

The fourth item to add is 12. It will be added to the tree as shown in the following illustration:

Item 12
Figure-4

With the insertion of item 12 above, the Binary Max Heap violates the relational property - the value of the parent node (node 7) of node 12 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:

Item Swap
Figure-5

The fifth item to add is 15. It will be added to the tree as shown in the following illustration:

Item 15
Figure-6

With the insertion of item 15 above, the Binary Max Heap violates the relational property - the value of the parent node (node 12) of node 15 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:

Item Swap
Figure-7

The sixth item to add is 2. It will be added to the tree as shown in the following illustration:

Item 2
Figure-8

The last item to add is 22. It will be added to the tree as shown in the following illustration:

Item 22
Figure-9

With the insertion of item 22 above, the Binary Max Heap violates the relational property - the value of the parent node (node 18) of node 22 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:

Item Swap
Figure-10

With the swap of item 22 above, the Binary Max Heap again violates the relational property - the value of the parent node (node 19) of node 22 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:

Item Swap
Figure-11

With the swap of item 22 above, the Binary Max Heap is in a valid state as shown in the following illustration:

Item Swap
Figure-12

A Binary Heap (Max or Min) tree can be implemented using an array. Assuming A is an array, the root of the Binary Heap tree will be at A[0]. For any element at index I, the left child will be at A[2*I+1] and the right child will be at A[2*I+2]. The parent of any element at index I can be located at A[(I-1)//2], where // represents integer division.

With this information, the Binary Max Heap we just created can be represented in an array form as shown in the following illustration:

Array Represntation
Figure-13

In the following sections, we will show how one could implement a Binary Max Heap using Python.

We will use the Python list (array) for storing the items of the Binary Max Heap.

Each item in the Binary Max Heap is an instance of KvPair, which represents a key-value pair and contains the following members:

The following is the implementation of the KvPair class using Python:

KvPair.py
#
# Name:   Data Structures - Key-Value Pair
# Author: Bhaskar S
# Blog:   https://www.polarsparc.com
#

class KvPair:
    __slots__ = ['_key', '_value']

    def __init__(self, k, v):
        self._key = k
        self._value = v

    def key(self):
        return self._key

    def value(self):
        return self._value

    def __gt__(self, other):
        return self._key > other.key()

    def __lt__(self, other):
        return self._key < other.key()

    def __repr__(self):
        return '(%s, %s)' % (self._key, self._value)

When an item is added to the Binary Max Heap, we need to ensure the relational property is strictly maintained - that is the item at the parent node of the newly added item is larger in value. If not, we swap the items. When an item is appended to the Python list, we know its index location (i). Using the formula (i - 1) // 2, we can calculate the index of its parent in the Python list.

The following is the code implementation to ensure the relational property of the Binary Max Heap is maintained by bubbling up the item with the largest value:

_adjust_up()
# This method will ensure the relation property of the binary heap is maintained
# once a new item is added - that is the item at the parent node of the just added
# item is larger in value
def _adjust_up(self):
    i = self._size - 1
    while i > 0:
        j = (i - 1) // 2  # parent
        if self._data[i] > self._data[j]:
            self._swap(i, j)
        else:
            break
        i = j

The worst-case time complexity (Big-O notation) for adding an item into a Binary Max Heap is O(log n).

Moving on to pop the item at the root (with the max value) from a Binary Max Heap. When the item at the root is popped out (removed), it needs to be replaced by the item at the farthest leaf node of the Binary Max Heap (last item in the array). This may violate the relational property of the Binary Max Heap and the nodes need to be re-adjusted starting from the root.

We will use our example Binary Max Heap to explain the pop operation as shown in the following illustration:

Item Pop
Figure-14

When the item 22 at the root is removed from a Binary Max Heap, it will be replaced by the item 18 in the last leaf node as shown in the following illustration:

Item Move
Figure-15

With the last item 18 is moved to the root, the Binary Max Heap violates the relational property - the value of the parent node (node 18) is smaller in value than its right child (node 19). This means, we need to swap the values to maintain the relational property as shown in the following illustration:

Item Swap
Figure-16

With the swap of item 18 above, we continue the process down the Binary Max Heap till it reaches a valid state as shown in the following illustration:

Valid State
Figure-17

The following is the code implementation to ensure the relational property of the Binary Max Heap is maintained by bubbling down the item with the smaller value:

_adjust_down()
# This method will ensure the relation property of the binary heap is maintained
# once an item is popped off the root and is replaced with the item at the last
# leaf node - that is the item at either left or right is smaller in value
def _adjust_down(self):
    i = 0
    while i < (self._size - 1):
        lc = (i * 2) + 1  # left child
        rc = (i * 2) + 2  # right child
        if lc > (self._size - 1):
            break
        j = lc
        if rc <= (self._size - 1) and self._data[rc] > self._data[lc]:
            j = rc
        if self._data[j] > self._data[i]:
            self._swap(i, j)
        else:
            break
        i = j

When the last item is moved to the root of the Binary Max Heap, we need to ensure the relational property is strictly maintained - that is the item at the parent node (at index i) is larger in value compared to its left child located at (2 * i + 1) and its right child located at (2 * i + 2). If not, we swap the items.

The worst-case time complexity (Big-O notation) for removing an item into a Binary Max Heap is O(log n).

The following is the complete source code for the Binary Max Heap data structure implemented using Python:

BinaryMaxHeap.py
#
# Name:   Data Structures - Binary Max Heap
# Author: Bhaskar S
# Blog:   https://www.polarsparc.com
#

from DataStructures.KvPair import KvPair


class BinaryMaxHeap:
    __slots__ = ['_data', '_size']

    def __init__(self):
        self._data = []
        self._size = 0

    def add_item(self, item):
        if not isinstance(item, KvPair):
            raise Exception("item is not a KvPair")

        self._data.append(item)
        self._size += 1

        self._adjust_up()

    def pop_item(self):
        if self._size == 0:
            raise Exception("Heap is empty !!!")

        res = self._data[0]

        last = self._data.pop()
        self._data[0] = last
        self._size -= 1

        self._adjust_down()

        return res

    def len(self):
        return self._size

    def max_item(self):
        if self._size == 0:
            raise Exception("Heap is empty !!!")

        return self._data[0]

    def print(self):
        print(self._data)

    # Method(s) for internal use only

    def _swap(self, i, j):
        self._data[i], self._data[j] = self._data[j], self._data[i]

    # This method will ensure the relation property of the binary heap is maintained
    # once a new item is added - that is the item at the parent node of the just added
    # item is larger in value
    def _adjust_up(self):
        i = self._size - 1
        while i > 0:
            j = (i - 1) // 2  # parent
            if self._data[i] > self._data[j]:
                self._swap(i, j)
            else:
                break
            i = j

    # This method will ensure the relation property of the binary heap is maintained
    # once an item is popped off the root and is replaced with the item at the last
    # leaf node - that is the item at either left or right is smaller in value
    def _adjust_down(self):
        i = 0
        while i < (self._size - 1):
            lc = (i * 2) + 1  # left child
            rc = (i * 2) + 2  # right child
            if lc > (self._size - 1):
                break
            j = lc
            if rc <= (self._size - 1) and self._data[rc] > self._data[lc]:
                j = rc
            if self._data[j] > self._data[i]:
                self._swap(i, j)
            else:
                break
            i = j


#
# Main function
#
def main():
    heap = BinaryMaxHeap()

    values = [19, 7, 18, 12, 15, 2, 22]
    for val in values:
        kv = KvPair(val, val)
        heap.add_item(kv)

    print('Size of the Binary Max Heap: %d' % heap.len())

    print('Max item of the Binary Max Heap: %s' % heap.max_item())

    heap.pop_item()

    print('Current items in the Binary Max Heap (after pop):')
    
    heap.print()


#
# Invoke main()
#
main()

Executing the above Python code will generate the following output:

Output.1

Size of the Binary Max Heap: 7
Max item of the Binary Max Heap: (22, 22)
Current items in the Binary Max Heap (after pop):
[(19, 19), (15, 15), (18, 18), (7, 7), (12, 12), (2, 2)]

One can also have a Binary Min Heap data structure where the value at an ancestor node is smaller than the values at its children.

The following is the complete source code for the Binary Min Heap data structure implemented using Python:

BinaryMinHeap.py
#
# Name:   Data Structures - Binary Min Heap
# Author: Bhaskar S
# Blog:   https://www.polarsparc.com
#

from DataStructures.KvPair import KvPair


class BinaryMinHeap:
    __slots__ = ['_data', '_size']

    def __init__(self):
        self._data = []
        self._size = 0

    def add_item(self, item):
        if not isinstance(item, KvPair):
            raise Exception("item is not a KvPair")

        self._data.append(item)
        self._size += 1

        self._adjust_up()

    def pop_item(self):
        if self._size == 0:
            raise Exception("Heap is empty !!!")

        res = self._data[0]

        last = self._data.pop()
        self._data[0] = last
        self._size -= 1

        self._adjust_down()

        return res

    def len(self):
        return self._size

    def min_item(self):
        if self._size == 0:
            raise Exception("Heap is empty !!!")

        return self._data[0]

    def print(self):
        print(self._data)

    # Method(s) for internal use only

    def _swap(self, i, j):
        self._data[i], self._data[j] = self._data[j], self._data[i]

    # This method will ensure the relation property of the binary heap is maintained
    # once a new item is added - that is the item at the parent node of the just added
    # item is smaller in value
    def _adjust_up(self):
        i = self._size - 1
        while i > 0:
            j = (i - 1) // 2  # parent
            if self._data[i] < self._data[j]:
                self._swap(i, j)
            else:
                break
            i = j

    # This method will ensure the relation property of the binary heap is maintained
    # once an item is popped off the root and is replaced with the item at the last
    # leaf node - that is the item at either left or right is larger in value
    def _adjust_down(self):
        i = 0
        while i < (self._size - 1):
            lc = (i * 2) + 1  # left child
            rc = (i * 2) + 2  # right child
            if lc > (self._size - 1):
                break
            j = lc
            if rc <= (self._size - 1) and self._data[rc] < self._data[lc]:
                j = rc
            if self._data[j] < self._data[i]:
                self._swap(i, j)
            else:
                break
            i = j


#
# Main function
#
def main():
    heap = BinaryMinHeap()

    values = [19, 7, 18, 12, 15, 2, 22]
    for val in values:
        kv = KvPair(val, val)
        heap.add_item(kv)

    print('Size of the Binary Min Heap: %d' % heap.len())

    print('Min item of the Binary Min Heap: %s' % heap.min_item())

    heap.pop_item()

    print('Current items in the Binary Min Heap (after pop):')

    heap.print()


#
# Invoke main()
#
main()

Executing the above Python code will generate the following output:

Output.2

Size of the Binary Min Heap: 7
Min item of the Binary Min Heap: (2, 2)
Current items in the Binary Min Heap (after pop):
[(7, 7), (12, 12), (18, 18), (19, 19), (15, 15), (22, 22)]

Python provides a module called heapq that implements the features of the Binary Min Heap.

The following shows the use of the heapq module using the Python REPL:

Output.3

>>> import heapq
>>> values = [19, 7, 18, 12, 15, 2, 22]
>>> heapq.heapify(values)
>>> values
[2, 7, 18, 12, 15, 19, 22]
>>> heapq.heappop(values)
2
>>> values
[7, 12, 18, 22, 15, 19]
>>> heapq.heappush(values, 4)
>>> values
[4, 12, 7, 22, 15, 19, 18]
>>> heapq.nsmallest(3, values)
[4, 7, 12]


© PolarSPARC