Longest Increasing Subsequence
The longest increasing subsequence is definied as follows: in a given array of numbers, to find a subsequence (not necessarily contiguous) that is strictly increasing which has the maximum length.
The following algorithm finds the longest increasing subsequence in O(n log n) time.
Key Implementation
Data Structure
This algorithm (nlogn algorithm) uses 2 array to store information.
itemIndexWithLength[l]
records index of the smallest item which is the end of a subsequence with length l
. It is used to maintain the longest increasing subsequence found so far.
int[] itemIndexWithLength = new int[size+1];
preItem[i]
records the index of i
's previous item in the longest increasing subsequence
int[] preItem = new int[size];
Algorithm Steps
Take a loop from 0 to size-1
For each item in the loop, we try to find the best position it can fit in. By 'fit' and 'best' we mean: for a target subsequence length value len
:
list[itemIndexWithLength[len]]
() is smaller then item. So current item can be a post-item of that mid-length subsequence.len
is the biggest value we can find.
This algorithm use binary search to find the best value of len
:
for(int i=0;i < size;i++){
int left = 1;
int right = currentMax;
while(left<=right){
int mid = (left+right)/2;
//get the item which is the end point of a mid-length subsequence and also the smallest item
int itemWithMidLength = itemIndexWithLength[mid];
if(list[itemWithMidLength] < list[i]){
//if fit, try longer (maybe it works too)
left = mid + 1;
}else{
//cannot fit, a subsequence end at i can not be mid long, try shorter
right = mid - 1;
}
}
}
This for-loop runs n times. Inside the for-loop, the while-loop continues log(n) times since each time a half of the target values can be eliminated. It takes a constant time to run each while-loop because all we have to do is read a value, take a comparison and set a value.
The algorithm takes nlog(n) time overall.
Testing
Dynamic Programming Algorithm
To test the accuracy and efficiency of this algorithm, a dynamic programing algorithm in O(n^2) has been used as comparison.
int maxLength[] = new int[size];
int preItem[] = new int[size];
maxLength[0] = 1;
preItem[0] = -1;
for(int i=1;i<size;i++){
//find the longest possible item in O(n)
int idx = 0;
int longest = 0;
for(int j=0;j<i;j++){
if(maxLength[j]>longest && list[j]<list[i]){
longest = maxLength[j];
idx = j;
}
}
maxLength[i] = longest + 1;
preItem[i] = idx;
}
maxLength[i]
means the longest increasing subsequence which ends at i
. To solve the situtaion of i
form i-1
, it uses the following recurrence relation:
M[i] = max{M[j]+1}; (j in 0...i-1 and item[j] < item[i])
Test Result (Comparing test)
In comparing testing, the size of input array is set in the range from 100 to 200000.
> (the size of array of all the test cases)
<tr> <td>100</td><td>500</td><td>1000</td><td>2000</td></td><td>5000</td><td>10000</td><td>20000</td> </tr> <tr> <td>35000</td><td>50000</td><td>75000</td><td>100000</td><td>120000</td><td>135000</td><td>150000</td> </tr> <tr> <td>160000</td><td>170000</td><td>180000</td><td>190000</td><td>200000</td> </tr>
To get an average running time, each test case will be run for 5 times. Here is the running results:
Size 100 500 1000 2000 5000 10000 20000 nlogn(ms) 0 0 0 0 0 0 1 n^2(ms) 0 2 1 6 43 193 769 Size 35000 50000 75000 100000 120000 135000 150000 nlogn(ms) 2 3 5 7 8 8 11 n^2(ms) 2359 4840 10872 19500 27919 34710 42866 Size 160000 170000 180000 190000 200000 nlogn(ms) 11 11 14 14 14 n^2(ms) 47621 54482 62146 68263 75708 The efficiency of this nlogn algorithm(O(nlogn)) can be easily observed when compared with a n^2 dynamic programming algorithm(O(n^2)).
Test Result (Single test)
In single test, the size of input is set to 4000000 as maximum. Here is the running result:
Size 200000 300000 400000 500000 600000 700000 Time(ms) 18 22 33 37 51 60 Size 800000 900000 1000000 1200000 1300000 1400000 Time(ms) 62 80 83 103 111 116 Size 1500000 1600000 1700000 1800000 1900000 2000000 Time(ms) 130 132 137 146 158 169 Size 2100000 2200000 2300000 2400000 2500000 2700000 Time(ms) 176 191 192 205 222 223 Size 3000000 3250000 3500000 3800000 4000000 Time(ms) 262 292 308 337 347 The following graph shows the time cost of this nlogn algorithm(O(n log n)).
Java File###
Download: LongestIncreasingSubsequence.java
Reference###
Amrinder Arora. Analysis And Design of Algorithms
Wikipedia. Longest increasing subsequence