โœŒ๏ธ

Binary search done right: k-th non-present element in sorted array in O(log n)

๐Ÿ’ก
Problem statement: you are given a sorted array of positive integers, you have to identify k-th non-present positive integer element.
ย 
Example:
array = [1, 2, 3, 5, 7, 8, 15]
                 ^ missing 4
                    ^ missing 6
                          ^ missing 9, 10, 11, 12, 13, 14
ย 
Note that by taking k-th element and subtracting its index, we get the number of skipped elements up to this point (plus one):
ย 
array[0] - 0 = 1  # 0 elements skipped + 1
array[1] - 1 = 1  # the same
array[3] - 3 = 5 - 3 = 2 # element `4` is skipped + 1
array[6] - 6 = 15 - 6 = 9 # 8 elements skipped + 1
ย 
Therefore, the sequence is non-decreasing. By using binary search, we can identify the left and right boundaries for the property โ€œthe number of skipped elements is less than or equal to kโ€.
ย 
def find_kth_nonpresent(array: list[int], k: int) -> int:
    left = -1
    right = len(array)
    while right - left > 1:
        mid = (left + right) // 2
        if array[mid] - mid - 1 < k:
            left = mid
        else:
            right = mid
    # discussion continues
ย 
In the above code, we make sure that
  • the left boundary satisfies the search condition. If we mathematically suppose that minus first element is , then we have no skipped elements, which is less than k.
  • the right boundary does not satisfy the search condition. Indeed, appending a positive infinity to the array, we have an infinite amount of skipped elements.
ย 
At the end of this piece, we have the left and right positions so that at left, there are less than k numbers skipped, and at right there are at least k numbers skipped.
ย 
From this information we can infer how many numbers were skipped at position left and how much do we need to add to get k th number.
# continuation of find_kth_nonpresent(array)
if left == -1:
    return k

n_elements_skipped = array[left] - left - 1
return array[left] + (k - n_elements_skipped)
ย 
Inequality: strict or non-strict?
When constructing a rule for the binary search, we need to decide whether we are going to have a strict or a non-strict inequality for the number of skipped elements. This has to be decided case-by-case. The idea is to squeeze the k-th missing number between array[left] and array[right]. If the inequality were non-strict, we could have skipped exactly k numbers at position left and so, the k-th missing element would be positioned before left.
ย 
Debugging remarks
At the first go, I did not manage to get the formula right. In the search condition, I used array[mid] - mid + 1 instead of array[mid] - mid - 1. Sadly, the same pattern is reused for calculation of the number of elements skipped, so I replicated this bug twice due to code duplication. It is more reasonable to write a helper function, even though the overall solution is quite short, and use it for the property verification, as well as for further computation.
This modification makes the binary search more readable as well.
def n_elements_skipped(array: list[int], position: int) -> int:
    return array[position] - position - 1

def find_kth_nonpresent(array: list[int], k: int) -> int:
    left = -1
    right = len(array)
    
    while right - left > 1:
        mid = (left + right) // 2
        if n_elements_skipped(array, mid) < k:       # better
            left = mid
        else:
            right = mid
            
    if left == -1:
        return k

    return array[left] + (k - n_elements_skipped(array, left))