Longest Increasing Subsequence

The longest increasing subsequence is definied as follows: in a given array of numbers, to find a subsequence (not necessarily contiguous) that is strictly increasing which has the maximum length.

The following algorithm finds the longest increasing subsequence in O(n log n) time.


Key Implementation

Data Structure

This algorithm (nlogn algorithm) uses 2 array to store information.

itemIndexWithLength[l] records index of the smallest item which is the end of a subsequence with length l. It is used to maintain the longest increasing subsequence found so far.

	int[] itemIndexWithLength = new int[size+1];

preItem[i] records the index of i's previous item in the longest increasing subsequence

	int[] preItem = new int[size];
Algorithm Steps

Take a loop from 0 to size-1
For each item in the loop, we try to find the best position it can fit in. By 'fit' and 'best' we mean: for a target subsequence length value len:

  • list[itemIndexWithLength[len]] () is smaller then item. So current item can be a post-item of that mid-length subsequence.
  • len is the biggest value we can find.

This algorithm use binary search to find the best value of len:

for(int i=0;i < size;i++){
	int left = 1;
	int right = currentMax;
	while(left<=right){
		int mid = (left+right)/2;
		//get the item which is the end point of a mid-length subsequence and also the smallest item
		int itemWithMidLength = itemIndexWithLength[mid];
		if(list[itemWithMidLength] < list[i]){
			//if fit, try longer (maybe it works too)
			left = mid + 1;
		}else{
			//cannot fit, a subsequence end at i can not be mid long, try shorter
			right = mid - 1;
		}
	}
}

This for-loop runs n times. Inside the for-loop, the while-loop continues log(n) times since each time a half of the target values can be eliminated. It takes a constant time to run each while-loop because all we have to do is read a value, take a comparison and set a value.
The algorithm takes nlog(n) time overall.


Testing

Dynamic Programming Algorithm

To test the accuracy and efficiency of this algorithm, a dynamic programing algorithm in O(n^2) has been used as comparison.

	int maxLength[] = new int[size];
	int preItem[] = new int[size];
	maxLength[0] = 1;
	preItem[0] = -1;
	for(int i=1;i<size;i++){
		//find the longest possible item in O(n)
		int idx = 0;
		int longest = 0; 
		for(int j=0;j<i;j++){
			if(maxLength[j]>longest && list[j]<list[i]){
				longest = maxLength[j];
				idx = j;
			}
		}
		maxLength[i] = longest + 1;
		preItem[i] = idx;
	}

maxLength[i] means the longest increasing subsequence which ends at i. To solve the situtaion of i form i-1, it uses the following recurrence relation:

M[i] = max{M[j]+1}; (j in 0...i-1 and item[j] < item[i])

Test Result (Comparing test)

In comparing testing, the size of input array is set in the range from 100 to 200000.

<tr>
    <td>100</td><td>500</td><td>1000</td><td>2000</td></td><td>5000</td><td>10000</td><td>20000</td>
</tr>
<tr>
    <td>35000</td><td>50000</td><td>75000</td><td>100000</td><td>120000</td><td>135000</td><td>150000</td>
</tr>
<tr>
<td>160000</td><td>170000</td><td>180000</td><td>190000</td><td>200000</td>
</tr>
> (the size of array of all the test cases)

To get an average running time, each test case will be run for 5 times. Here is the running results:

Size1005001000200050001000020000
nlogn(ms)0000001
n^2(ms)021643193769
Size350005000075000100000120000135000150000
nlogn(ms)23578811
n^2(ms)235948401087219500279193471042866
Size160000170000180000190000200000
nlogn(ms)1111141414
n^2(ms) 4762154482621466826375708

The efficiency of this nlogn algorithm(O(nlogn)) can be easily observed when compared with a n^2 dynamic programming algorithm(O(n^2)).


Test Result (Single test)

In single test, the size of input is set to 4000000 as maximum. Here is the running result:

Size200000300000400000500000600000700000
Time(ms)182233375160
Size8000009000001000000120000013000001400000
Time(ms)628083103111116
Size150000016000001700000180000019000002000000
Time(ms)130132137146158169
Size210000022000002300000240000025000002700000
Time(ms)176191192205222223
Size30000003250000350000038000004000000
Time(ms)262292308337347

The following graph shows the time cost of this nlogn algorithm(O(n log n)).



Java File###

Download: LongestIncreasingSubsequence.java


Reference###

Amrinder Arora. Analysis And Design of Algorithms
Wikipedia. Longest increasing subsequence