Problem. Identify the length of the longest increasing subsequence of a given sequence.
You can test your solutions with LeetCode problem #300. The difficulty “medium” corresponds to a quadratic DP solution, while the presented approach in would be rather classified as “hard”, and it works for larger arrays within given time constraints.
This post is an adaptation of a translated post from e-maxx.ru, check out other algorithms there.
Reminder. Python’s slicing operation corresponds to a half-open interval , which is equivalent to
Dynamic programming approach
A decent dynamic programming approach in .
- Let be the initial array.
- Let denote the length of the longest increasing subsequence of the array ending in position .
Then, we can recursively calculate the values by trying all possible last positions and updating if necessary:
Conventionally, if the set is empty, we set its max to 0, because there would be no paths starting before the current position, and therefore, the maximal length of a sequence terminating at that point should be 1.
Finding the value of the next element requires scanning over elements, and therefore, the final complexity is quadratic.
This algorithm is easy to code, but there exists a faster algorithm which uses a mixture of binary search and dynamic programming.
Retrospective
In order to use binary search, we need to think whether we could do the scanning faster, in operations instead of . In order to carry out a binary search, we need a monotonic property over this sequence. In other words, the array needs to be partitioned into two consecutive segments such that the property holds for the first segment and does not hold for the second one.
In the above example, we are using the property , which is not monotonic, when is fixed and runs over all possible values in . Extracting the maximum is also not in the scope of a binary search procedure.
However, the array itself is monotonic because adding new possible elements does not decrease the length of the sequence. We need to think of a different problem formulation satisfying a monotonic property.
A new recurrence design
Example. Imagine that we scanned the array recursively up to a given point and the longest path so far has length 3. Suppose that the longest increasing path has a form . Then, adding a new element could potentially form a subsequence of length 4. But it does not necessarily include the above fixed sequence with 3 elements.
Idea. We want to constrain the sequences in such a way that among all the sequences of length 3 we choose the one with the smallest terminating element. In other words, suppose that is as small as possible among all the sequences of length 3. Now, can we use as a prefix for a subsequence of length 4, if it exists?
Click to see the answer.
Yes, because if we want to create an increasing sequence of length 4, then its fourth element should be strictly larger than its third element. Therefore, there should already exist a sequence of length 3 whose last element is smaller than the new element.
This idea allows to define a new dynamic programming array eventually leading to a construction of the maximal sequence, term by term, by ensuring that the last element is as small as possible.
Definition. Let denote the index of the element where the subsequence of length terminates. If there are several such subsequences, we choose the one terminating in the element with the smallest value.
We need to be careful with this definition, though, because processing a new element might update a sequence of length , but does not necessarily update the sequence of lengths .
Exercise. Should be an increasing sequence at a given time point of processing the array ?
Click to see the answer.
No, here is a counterexample.
However, if we look at the values of the elements, and not their indices, we obtain a monotonic sequence!
Exercise. Sequence is strictly increasing.
Proof (click to expand).
At each time, for a given index , the element only may decrease after an update. However, by decreasing we make a new increasing subsequence which builds upon an existing smaller increasing subsequence. Therefore, .
Binary search Property. Fix a position in the array . We need to find the maximal index such that
By finding such index, we obtain the length of the longest sequence terminating in . If necessary, we can update accordingly.
Implementation detail. For simplicity, instead of keeping indices in , one can keep the values of the array. In order to do the backtracking in the end, one can use auxilliary arrays.
Pseudocode (click to expand).
- Input: .
- Initialise with and with .
- Initialise with .
- For each find the maximal such that .
- Set .
- The answer is the maximal index such that .
Implementation
Step 1.
The main dynamic programming is nothing but a simple for-loop.
def lengthOfLIS(self, nums: List[int]) -> int:
n = len(nums)
infinity = float("inf")
nums.append(infinity)
last_indices = [n for _ in range(n+1)]
last_indices[0] = -infinity
for i in range(n):
... # update the array last_indices accordingly
return max([
idx
for idx, elem in enumerate(last_indices)
if elem != n
])
Step 2.
Now, we are implementing the binary search inside
last_indices
.- Initial boundaries. , .
- Property.
- After the binary search is done, update
last_indices
.
# input: i in [0, n)
# we are currently inside the for-loop
left = 0
right = n
while right - left > 1:
mid = (right + left) // 2
if nums[last_indices[mid]] < nums[i]:
left = mid
else:
right = mid
last_indices[right] = i
Applications: Circus tower, Russian doll envelopes
If you successfully pass the LeetCode problem 300, you will be suggested several more.
An example is LeetCode problem 354 (hard) which requires the same method.
You are given a 2D array of integers
envelopes
where envelopes[i] = [
,
]
represents the width and the height of an envelope.One envelope can fit into another if and only if both the width and height of one envelope are greater than the other envelope's width and height.
Return the maximum number of envelopes you can Russian doll (i.e., put one inside the other).
This problem is hard because the maximal size can be so you cannot squeeze a quadratic algorithm into this constraint.
Solution. (click to expand)
Sort the pairs on one coordinate and find maximal increasing sequence on the second coordinate.
Follow-up. How would you implement the backtracking procedure?