Longest Increasing Subsequence
The longest increasing subsequence is definied as follows: in a given array of numbers, to find a subsequence (not necessarily contiguous) that is strictly increasing which has the maximum length.
The following algorithm finds the longest increasing subsequence in O(n log n) time.
Key Implementation
Data Structure
This algorithm (nlogn algorithm) uses 2 array to store information.
itemIndexWithLength[l]
records index of the smallest item which is the end of a subsequence with length l
. It is used to maintain the longest increasing subsequence found so far.
int[] itemIndexWithLength = new int[size+1];
preItem[i]
records the index of i
's previous item in the longest increasing subsequence
int[] preItem = new int[size];
Algorithm Steps
Take a loop from 0 to size-1
For each item in the loop, we try to find the best position it can fit in. By 'fit' and 'best' we mean: for a target subsequence length value len
:
list[itemIndexWithLength[len]]
() is smaller then item. So current item can be a post-item of that mid-length subsequence.len
is the biggest value we can find.
This algorithm use binary search to find the best value of len
:
for(int i=0;i < size;i++){
int left = 1;
int right = currentMax;
while(left<=right){
int mid = (left+right)/2;
//get the item which is the end point of a mid-length subsequence and also the smallest item
int itemWithMidLength = itemIndexWithLength[mid];
if(list[itemWithMidLength] < list[i]){
//if fit, try longer (maybe it works too)
left = mid + 1;
}else{
//cannot fit, a subsequence end at i can not be mid long, try shorter
right = mid - 1;
}
}
}
This for-loop runs n times. Inside the for-loop, the while-loop continues log(n) times since each time a half of the target values can be eliminated. It takes a constant time to run each while-loop because all we have to do is read a value, take a comparison and set a value.
The algorithm takes nlog(n) time overall.
Testing
Dynamic Programming Algorithm
To test the accuracy and efficiency of this algorithm, a dynamic programing algorithm in O(n^2) has been used as comparison.
int maxLength[] = new int[size];
int preItem[] = new int[size];
maxLength[0] = 1;
preItem[0] = -1;
for(int i=1;i<size;i++){
//find the longest possible item in O(n)
int idx = 0;
int longest = 0;
for(int j=0;j<i;j++){
if(maxLength[j]>longest && list[j]<list[i]){
longest = maxLength[j];
idx = j;
}
}
maxLength[i] = longest + 1;
preItem[i] = idx;
}
maxLength[i]
means the longest increasing subsequence which ends at i
. To solve the situtaion of i
form i-1
, it uses the following recurrence relation:
M[i] = max{M[j]+1}; (j in 0...i-1 and item[j] < item[i])
Test Result (Comparing test)
In comparing testing, the size of input array is set in the range from 100 to 200000.
<tr>
<td>100</td><td>500</td><td>1000</td><td>2000</td></td><td>5000</td><td>10000</td><td>20000</td>
</tr>
<tr>
<td>35000</td><td>50000</td><td>75000</td><td>100000</td><td>120000</td><td>135000</td><td>150000</td>
</tr>
<tr>
<td>160000</td><td>170000</td><td>180000</td><td>190000</td><td>200000</td>
</tr>
> (the size of array of all the test cases)
To get an average running time, each test case will be run for 5 times. Here is the running results:
Size | 100 | 500 | 1000 | 2000 | 5000 | 10000 | 20000 |
---|---|---|---|---|---|---|---|
nlogn(ms) | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
n^2(ms) | 0 | 2 | 1 | 6 | 43 | 193 | 769 |
Size | 35000 | 50000 | 75000 | 100000 | 120000 | 135000 | 150000 |
nlogn(ms) | 2 | 3 | 5 | 7 | 8 | 8 | 11 |
n^2(ms) | 2359 | 4840 | 10872 | 19500 | 27919 | 34710 | 42866 |
Size | 160000 | 170000 | 180000 | 190000 | 200000 | ||
nlogn(ms) | 11 | 11 | 14 | 14 | 14 | ||
n^2(ms) | 47621 | 54482 | 62146 | 68263 | 75708 |
The efficiency of this nlogn algorithm(O(nlogn)) can be easily observed when compared with a n^2 dynamic programming algorithm(O(n^2)).
Test Result (Single test)
In single test, the size of input is set to 4000000 as maximum. Here is the running result:
Size | 200000 | 300000 | 400000 | 500000 | 600000 | 700000 |
---|---|---|---|---|---|---|
Time(ms) | 18 | 22 | 33 | 37 | 51 | 60 |
Size | 800000 | 900000 | 1000000 | 1200000 | 1300000 | 1400000 |
Time(ms) | 62 | 80 | 83 | 103 | 111 | 116 |
Size | 1500000 | 1600000 | 1700000 | 1800000 | 1900000 | 2000000 |
Time(ms) | 130 | 132 | 137 | 146 | 158 | 169 |
Size | 2100000 | 2200000 | 2300000 | 2400000 | 2500000 | 2700000 |
Time(ms) | 176 | 191 | 192 | 205 | 222 | 223 |
Size | 3000000 | 3250000 | 3500000 | 3800000 | 4000000 | |
Time(ms) | 262 | 292 | 308 | 337 | 347 |
The following graph shows the time cost of this nlogn algorithm(O(n log n)).
Java File###
Download: LongestIncreasingSubsequence.java
Reference###
Amrinder Arora. Analysis And Design of Algorithms
Wikipedia. Longest increasing subsequence