Jonathan. Frech’s WebBlog

Heap­sort (#201)

Jonathan Frech,

Introduction

Continuing my journey im­ple­ment­ing var­i­ous sorting algorithms in C, in this post I am departing from the most well-known algorithms and im­ple­ment­ing one of my personal favourites — heap­sort.
Download the C source file here: heap­sort.c.

Contrary to algorithms like quick­sort which im­me­di­ate­ly jump into action sorting the given array, heap­sort operates on a data structure called a heap which it efficiently transforms into a sorted list. How­ev­er, as most arrays are not of the heap structure, heap­sort first needs to transform a given array into a heap. Thus heap­sort is a two-step process — first creating a heap and then sorting said heap.

Sorting can be applied to an infinite num­ber of ob­jects pro­vid­ed there is an order defined among them. How­ev­er, not much is gained from changing the underlying value one is sorting which is why in this post I will on­ly focus on sorting integers — technically even on­ly integers in the C type sense; $n\in\mathbb{Z},-2^{31} \leq n < 2^{31}$$n\in\mathbb{Z},-2^{31} \leq n < 2^{31}$$n\in\mathbb{Z},-2^{31} \leq n < 2^{31}$.

The Heap

A heap is a special type of binary tree that satisfies the heap prop­er­ty — every parent node’s value is not less than its child node’s (if existent) values. Fur­ther­more, a heap is maximally filled at every level but pos­si­bly the last where the elements are as far left as possible.
From these prop­er­ties it follows that the greatest value will be located at the root node.

One can also define a heap such that the root node will house the smallest element; such a heap would lead to a list sorted in descending order (more on that later).

      ( 86 )      
     /      \     
  (31)      (64)  
  /  \      /     
(20)(-4)  (17)    

A heap containing integers.
-=-

Because of the above prop­er­ties, a heap can be nicely packed to­geth­er into an array by juxtaposing the tree’s layers. The resulting array will not have any holes, as the heap is maximally filled.
In this array rep­re­sent­ation, a node at index 𝑛 ≠ 0 has its parent node at index $N=(n-1)/2$$N=(n-1)/2$$N=(n-1)/2$ and a node at index 𝑛 has its children at indices $n_1=2\cdot n+1$$n_1=2\cdot n+1$$n_1=2\cdot n+1$ and $n_2=2\cdot n+2$$n_2=2\cdot n+2$$n_2=2\cdot n+2$, if pre­sent.

      ( 86 )     <---+                       
     /      \        |                       
  (31)      (64) <---|-------+               
  /  \      /        |       |               
(20)(-4)  (17)   <---|-------|----------+    
                     |       |          |    
                     v       v          v    
                    [86,   31,64,   20,-4,17]

An integer heap's array representation.

Trickling

Using this rep­re­sent­ation, any list can be thought of as a maximally filled binary tree. Heap­sort’s first step is to take such an array and transform it such that it satisfies the heap prop­er­ty. To do so, one progressively trickles down small elements to the bottom. Starting at the bottom-most full layer at the right-most node with at least one child, a parent is swapped with the largest of its children, if smaller.

      ( -4 )            ( -4 )            { -4 }            ( 86 )    
     /      \          /      \          /      \          /      \   
  (17)      {31} => {17}      (86) => (64)      {86} => (64)      (-4)
  /  \      /       /  \      /       /  \      /       /  \      /   
(20)(64)  {86}    (20){64}  (31)    (20)(17)  (31)    (20)(17)  (31)  

Using recursive trickling to transform an upside-down heap into a heap.

Sorting

Having transformed a given array into the heap structure, one can remove the root node, as it will con­tain the largest value. How­ev­er, once removed the resulting ob­ject is not a heap anymore — not even a tree.
To fix the hole where the largest element was ripped out, one replaces it with the bottom-most right-most element and trickles said element, lead­ing to — once again — a heap. Iterative ap­pli­ca­tion of the described algorithm allows one to sort a list.

       [ 86 ]            { 17 }            [ 64 ]            { -4 }    
      /      \          /      \          /      \          /      \   
   (31)      (64) => (31)      {64} => (31)      (17) => {31}      (17)
   /  \      /       /  \              /  \              /             
 (20)(-4)  [17]    (20)(-4)          (20)[-4]          (20)            

       ( 31 )            [ 31 ]            { -4 }            [ 20 ]    
      /      \          /      \          /      \          /      \   
=> {-4}      (17) => (20)      (17) => {20}      (17) => (-4)      [17]
   /                 /                                                 
 {20}              [-4]                                                

       [ 17 ]             [ -4 ]                                       
      /                                                                
=> [-4]            =>                                                  
                                                                       
Sorted list: [86, 64, 31, 20, 17, -4]

Heapsort in action.
Curly braces represent trickling swaps,
square brackets element removal and tree restoration.

An interesting prop­er­ty of heap­sort is that it can be done in-place — not requiring any more memory than the original array. Because of this, I defined my heap so that the root node has the largest value — swapping to fill the hole places the heap’s largest value at the array’s end. The resulting array is thus sorted in ascending order with­out requiring to re­verse the array.

Timing

Heap­sort’s average algorithmic time com­plex­i­ty is $\mathcal{O}(n\cdot\ln n)$$\mathcal{O}(n\cdot\ln n)$$\mathcal{O}(n\cdot\ln n)$. To ensure my im­ple­men­ta­tion does not suffer from any obvious im­ple­men­ta­tion oversight, I timed it on var­i­ous ran­dom­ly gen­er­ated in­te­ger arrays of dif­fer­ing sizes.

1.5s ^                                                              *
     |                                                      ******** 
     |                                              ********         
     |                                     *********                 
     |                             ********                          
     |                    *********                                  
     |          **********                                           
0.0s ***********---------------------------------------------------->
     0 elements                                      6300000 elements

Timing the sorting of 64 lists of lengths {0, 100000, ..., 6300000}.

Compiled and run using gcc heapsort.c -o h; ./h
on an ArchLabs 2018 system with an Intel i7-4790K CPU @ 4.00GHz.

In­sta­bil­i­ty

Heap­sort is a fine sorting algorithm, especially if memory is sparse. Being able to per­form in-place, its average algorithmic memory com­plex­i­ty is $\mathcal{O}(1)$$\mathcal{O}(1)$$\mathcal{O}(1)$. Its average algorithmic time com­plex­i­ty — as stated above — is $\mathcal{O}(n\cdot\ln n)$$\mathcal{O}(n\cdot\ln n)$$\mathcal{O}(n\cdot\ln n)$.
One drawback of heap­sort, how­ev­er, is its in­sta­bil­i­ty. When on­ly sorting integers stability does not matter, but when sorting tuples according to on­ly one entry or sorting using another criterium that does not consider the element’s full entropy — meaning there exist unequal elements with equal ordering —, stability may be­come important. Stability is a sorting algorithm’s prop­er­ty that elements of equal ordering remain in the order they originally were in.

   [ 0]         [ 1]        [-1]
   /  \    =>   /      =>       
(-1)  [ 1]   [-1]               

Sorted list: [0, 1, -1]

Sorting the integer array [0, -1, 1] using heapsort
with a classically oriented heap and the integer's absolute
value as an ordering criterium, showing heapsort's instability.