Cracking a Coding Interview: Finding the Kth Largest Value in an Unordered Array

Dumi Loghin
8 min readNov 17, 2020

This is the first tutorial in a series related to “cracking a coding interview”. In this kind of tutorial, I will show you how to solve a coding problem, starting with from a simple solution, and gradually getting to an efficient solution. This is what many interviewers are looking for in good candidates.

In this tutorial, we will find the Kth largest value in an unordered array.

Problem

Given an unsorted array of N integers, find the Kth largest value. You are not allowed to use a built-in sorting function.

Example

[6, 7, 2, 1, 4, 5, 3], K = 2

Kth (2nd) largest value is 6.

1st Approach

Obviously, it would be much easier if the list (or array) was sorted in descending order. In this case, we just have to pick the Kth element. Since we are not allowed to use a built-in sorting function, such as qsort() in C, we need to implement our own sort. A quick implementation is selection sort.

void selection_sort(int* a, int n) {
int i, j;
for (i = 0; i < n-1; i++) {
int i_max = i;
for (j = i + 1; j < n; j++)
if (a[j] > a[i_max])
i_max = j;
if (i_max != i)
swap(a, i, i_max);
}
}
// we suppose k <= n
int find_k_max_v1(int* a, int n, int k) {
selection_sort(a, n);
return a[k-1];
}

Selection sort in descending order, consists of N-1 steps. At each step I (1 ≤ I ≤ N), we find the largest (maximum) value in the sub-array [I:N] and put it on position I. To find this value, we need to iterate through N-I+1 values. Hence, the complexity of this algorithm is O(N²). This is not so good: we need to do better.

2nd Approach

There are faster sorting algorithms such as quick sort, merge sort, and heap sort, which all have complexities of O(nlogn). Quick sort is better than merge sort and heap sort because it can perform the sorting in place, hence, it does not need additional memory. However, I will go with heap sort because the 3rd approach I am going to present uses the heap. In essence, the second approach is the same as the first one, just that the sorting is faster.

int find_k_max_v2(int* a, int n, int k) {
heap_sort(a, n);
return a[k-1];
}

The Heap Data Structure and Heap Sort

A min-heap is a complete binary tree with the property that the value of any node is smaller than the values of its children. Similarly, a max-heap is a complete binary tree with the property that the value of any node is larger than the values of its children.

This property means that the root of a min-heap contains the smallest value of the tree. Hence, we can use this property to perform sorting in N steps. At each step I, we extract the root of the heap and put it on position I in the sorted array. Then, we take the last leaf of the heap and set it as the root. Next, we make sure the heap property is preserved, by pushing down the value of the root, if necessary.

Lets take our example. The min-heap corresponding to the input array [6, 7, 2, 1, 4, 5, 3] is show in Figure 1. Being a complete tree, it can be represented as an array: [1, 2, 3, 7, 4, 6, 5].

Figure 1. The min-heap corresponding to our example array

In its array representation, we need a way to compute the position (or index) of the children and parent for a given node. Lets consider that the array is zero-based, as in C. Then, a node at index I has its left and right children at index 2I+1 and 2I+2, respectively. A node at index I has its parent at index (I-1)/2.

Going back to our example, lets see how to push down a value in the heap to preserve the heap’s property. Suppose we extract the root (value 1), take the last node (value 5) and put its value in the root, as shown in Figure 2. Now, 5 is greater than both 2 and 3 (its children). So we need to take the minimum value out of these three (5, 2, 3) and swap it with 5. In this case, we swap 5 and 2. 2 becomes the new root and 5 goes one level down. Then we repeat this algorithm until there is no need to swap or we reach the leaf level. In our case, we need to swap 5 and 4, after which 5 becomes a leaf.

Figure 2. Pushing down a value in a min-heap

Below is the C code corresponding to this algorithm.

void heap_push_down(int* heap, int n, int idx) {
while (idx < n) {
int idxl = index_left_child(idx);
int idxr = index_right_child(idx);
if (idxl >= n) // both children out of the heap?
return;
if (idxr >= n) { // only the right children out of the heap?
if (heap[idx] > heap[idxl])
swap(heap, idx, idxl);
return;
}
else {
if (heap[idx] > heap[idxl]) {
if (heap[idxl] > heap[idxr]) {
swap(heap, idx, idxr);
idx = idxr;
}
else {
swap(heap, idx, idxl);
idx = idxl;
}
}
else {
if (heap[idx] > heap[idxr]) {
swap(heap, idx, idxr);
idx = idxr;
}
else
return;
}
}
}
}

We saw how to extract the root and preserve the heap property. What about actually building the heap? Lets suppose we have an array H representing a heap of size NH. We want to add a new element to build a heap of size NH+1. The natural way to do this is adding the new element at position NH (zero-based) and make sure the heap property is preserved. For this, we may need to push up the value in the heap.

For example, we consider that we have the min-heap corresponding to [6, 7, 2, 1, 4], which is [1, 2, 6, 7, 4], and now we need to add the value of 5, as shown in Figure 3. 5 is smaller than its parent, so we need to swap them. We repeat this process until there is no need to swap or we reached the root.

Figure 3. Pushing up a value in a min-heap

Below is the C code corresponding to this algorithm.

void heap_push_up(int* heap, int n, int idx) {
while (idx > 0) {
int idxp = index_parent(idx);
if (heap[idx] < heap[idxp]) {
swap(heap, idx, idxp);
idx = idxp;
}
else
return;
}
}

With these two heap algorithms, we can write the heap sort program, as below.

void heap_sort(int* a, int n) {
int i;
int* h = (int*)malloc(n * sizeof(int));
if (!h) {
printf("Error allocating heap!\n");
return;
}
// build heap
int nh = 1; // size of the heap
h[0] = a[0];
for (i = 1; i < n; i++) {
h[i] = a[i];
nh++;
heap_push_up(h, nh, i);
}
// sort in descending order
for (i = n-1; i >= 0; i--) {
a[i] = h[0];
h[0] = h[nh-1];
nh--;
heap_push_down(h, nh, 0);
}
free(h);
}

In the first step, we build the heap by pushing up the new elements (values). In the second step, we extract the root and place it on its corresponding position in the sorted array. Note that heap sort needs an extra array of the same size as the input array.

3rd Approach

Can we do better than O(nlogn) to find the Kth largest value? What if we build a data structure with K elements and we make sure the first one is the smallest in this structure? We could use a min-heap for this. So the third approach of finding the Kth largest value in an array consists of two phases. In the first phase, we use the first K elements of the array to build a min-heap of size K. In the second phase, for the remaining N-K values, we compare each of them with the root of the heap, and, if it is greater, we remove the root and add this value into the heap.

In our example, the values [6, 7, 2, 1, 4, 5, 3] and K = 2. We use the first two values to build a min-heap, [6, 7]. All the other values are smaller than 6, so we do not need to modify the heap. In the end, the 2nd largest value is 6. Now, suppose K = 3. Then we build a heap [2, 7, 6]. 1 does nothing, but 4 > 2. We extract 2 and put 4 as root. The heap property is preserved. Then, 5 > 4, we remove 4, add 5 as root and the heap property is preserved. Since 3 < 5, we have the 3rd largest value as 5.

The code corresponding to this third approach is:

int find_k_max_v3(int* a, int n, int k) {
int i, nh;
// build heap using first k values
int* h = (int*)malloc(k * sizeof(int));
if (!h) {
printf("Error allocating heap of size k (%d)\n", k);
return 0;
}
nh = 1;
h[0] = a[0];
for (i = 1; i < k; i++) {
h[i] = a[i];
nh++;
heap_push_up(h, nh, i);
}
// add the next n-k values
for (; i < n; i++) {
if (a[i] > h[0]) {
h[0] = a[i];
heap_push_down(h, nh, 0);
}
}
int ret = h[0];
free(h);
return ret;
}

The complexity of building a heap of size NH is known to be O(NH). Pushing down a value in a heap of size NH is O(logNH). Thus, the complexity of out third approach is O(K + (N-K)logK).

Performance Comparison

In the end, I compare the execution time of the three approaches. I run them on an Intel Core i5 CPU with arrays of size 10,000 to 100,000 and I set K = 0.75N. Figure 4 shows how bad is the first approach when N grows linearly.

Figure 4. Execution time

Figure 5 show the same time results in log scale, to better see the difference between the second and third approach.

Figure 5. Execution time in log scale

Looking forward for your comments and suggestions!

--

--

Dumi Loghin

I am a Research Fellow in Computer Science with experience in parallel and distributed systems, blockchain, and performance evaluation.