hp
toc

Heapsort

2018-08-11, post № 201

C, programming, #algorithms, #sorting

Introduction

Continuing my journey implementing various sorting algorithms in C, in this post I am departing from the most well-known algorithms and implementing one of my personal favourites — heapsort.
Download the C source file here: heapsort.c.

Contrary to algorithms like quicksort which immediately jump into action sorting the given array, heapsort operates on a data structure called a heap which it efficiently transforms into a sorted list. However, as most arrays are not of the heap structure, heapsort first needs to transform a given array into a heap. Thus heapsort is a two-step process — first creating a heap and then sorting said heap.

Sorting can be applied to an infinite number of objects provided there is an order defined among them. However, not much is gained from changing the underlying value one is sorting which is why in this post I will only focus on sorting integers — technically even only integers in the C type sense; n\in\mathbb{Z},-2^{31} \leq n < 2^{31}.

The Heap

A heap is a special type of binary tree that satisfies the heap property — every parent node’s value is not less than its child node’s (if existent) values. Furthermore, a heap is maximally filled at every level but possibly the last where the elements are as far left as possible.
From these properties it follows that the greatest value will be located at the root node.

One can also define a heap such that the root node will house the smallest element; such a heap would lead to a list sorted in descending order (more on that later).

      ( 86 )      
     /      \     
  (31)      (64)  
  /  \      /     
(20)(-4)  (17)    

A heap containing integers.

Because of the above properties, a heap can be nicely packed together into an array by juxtaposing the tree’s layers. The resulting array will not have any holes, as the heap is maximally filled.
In this array representation, a node at index 𝑛 ≠ 𝟢 has its parent node at index N=(n-1)/2 and a node at index 𝑛 has its children at indices n_1=2\cdot n+1 and n_2=2\cdot n+2, if present.

      ( 86 )     <---+                       
     /      \        |                       
  (31)      (64) <---|-------+               
  /  \      /        |       |               
(20)(-4)  (17)   <---|-------|----------+    
                     |       |          |    
                     v       v          v    
                    [86,   31,64,   20,-4,17]

An integer heap's array representation.

Trickling

Using this representation, any list can be thought of as a maximally filled binary tree. Heapsort’s first step is to take such an array and transform it such that it satisfies the heap property. To do so, one progressively trickles down small elements to the bottom. Starting at the bottom-most full layer at the right-most node with at least one child, a parent is swapped with the largest of its children, if smaller.

      ( -4 )            ( -4 )            { -4 }            ( 86 )    
     /      \          /      \          /      \          /      \   
  (17)      {31} => {17}      (86) => (64)      {86} => (64)      (-4)
  /  \      /       /  \      /       /  \      /       /  \      /   
(20)(64)  {86}    (20){64}  (31)    (20)(17)  (31)    (20)(17)  (31)  

Using recursive trickling to transform an upside-down heap into a heap.

Sorting

Having transformed a given array into the heap structure, one can remove the root node, as it will contain the largest value. However, once removed the resulting object is not a heap anymore — not even a tree.
To fix the hole where the largest element was ripped out, one replaces it with the bottom-most right-most element and trickles said element, leading to — once again — a heap. Iterative application of the described algorithm allows one to sort a list.

       [ 86 ]            { 17 }            [ 64 ]            { -4 }    
      /      \          /      \          /      \          /      \   
   (31)      (64) => (31)      {64} => (31)      (17) => {31}      (17)
   /  \      /       /  \              /  \              /             
 (20)(-4)  [17]    (20)(-4)          (20)[-4]          (20)            

       ( 31 )            [ 31 ]            { -4 }            [ 20 ]    
      /      \          /      \          /      \          /      \   
=> {-4}      (17) => (20)      (17) => {20}      (17) => (-4)      [17]
   /                 /                                                 
 {20}              [-4]                                                

       [ 17 ]             [ -4 ]                                       
      /                                                                
=> [-4]            =>                                                  
                                                                       
Sorted list: [86, 64, 31, 20, 17, -4]

Heapsort in action.
Curly braces represent trickling swaps,
square brackets element removal and tree restoration.

An interesting property of heapsort is that it can be done in-place — not requiring any more memory than the original array. Because of this, I defined my heap so that the root node has the largest value — swapping to fill the hole places the heap’s largest value at the array’s end. The resulting array is thus sorted in ascending order without requiring to reverse the array.

Timing

Heapsort’s average algorithmic time complexity is \mathcal{O}(n\cdot\ln n). To ensure my implementation does not suffer from any obvious implementation oversight, I timed it on various randomly generated integer arrays of differing sizes.

1.5s ^                                                              *
     |                                                      ******** 
     |                                              ********         
     |                                     *********                 
     |                             ********                          
     |                    *********                                  
     |          **********                                           
0.0s ***********---------------------------------------------------->
     0 elements                                      6300000 elements

Timing the sorting of 64 lists of lengths {0, 100000, ..., 6300000}.

Compiled and run using gcc heapsort.c -o h; ./h
on an ArchLabs 2018 system with an Intel i7-4790K CPU @ 4.00GHz.

Instability

Heapsort is a fine sorting algorithm, especially if memory is sparse. Being able to perform in-place, its average algorithmic memory complexity is \mathcal{O}(1). Its average algorithmic time complexity — as stated above — is \mathcal{O}(n\cdot\ln n).
One drawback of heapsort, however, is its instability. When only sorting integers stability does not matter, but when sorting tuples according to only one entry or sorting using another criterium that does not consider the element’s full entropy — meaning there exist unequal elements with equal ordering —, stability may become important. Stability is a sorting algorithm’s property that elements of equal ordering remain in the order they originally were in.

   [ 0]         [ 1]        [-1]
   /  \    =>   /      =>       
(-1)  [ 1]   [-1]               

Sorted list: [0, 1, -1]

Sorting the integer array [0, -1, 1] using heapsort
with a classically oriented heap and the integer's absolute
value as an ordering criterium, showing heapsort's instability.
Jonathan Frech's blog; built 2024/08/31 22:59:44 CEST