Gaurav's Blog

return rand();

Longest Common Subsequence

| Comments

I was recently solving Aibophobia on the popular algorithm problem online judge, SPOJ.

The problem’s solution involves finding the Longest Common Subsequence of two strings.

A trivial solution to the problem is exponential in complexity. However, Dynamic Programming can be used to reduce the problem to time complexity $O(N^2)$.

There are usually two approaches to coding a Dynamic Programming solution. The first, is my favorite and very intuitive approach of Recursion with Memoisation, also known as the Top-down approach.

topDownLCS.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
int dp[6100][6100];
int lcs(int i, int j)
{
	int &res=dp[i][j];
	if(res!=-1) return res;
	if(i==n) return res=0;
	if(j==m) return res=0;
	
	res=max(lcs(i+1,j),lcs(i,j+1));
	if(a[i]==b[j])
		return res=max(1+lcs(i+1,j+1),res);
	return res;
}

Note that, here n, m are lengths of strings a and b, respectively and the dp array is initialized to -1. lcs(i,j) gives the value of the Longest Common Subsequence of strings a and b, starting from the i-th index of a and j-th index of b. Note that in lines 9-11 implement how the answer will be calculated recursively, and 6-7 define when the recursion will be terminated. Why is this fast? Because we memoize the calculated values in the dp array and if lcs(i,j) is called again, the cached value would be returned. The LCS of two strings would be returned by the call lcs(0,0).

Another approach is the bottom-up approach, which is easy to understand once you code the top-down approach. The problem with the previous top-down approach is that, in cases of large inputs, recursive calls, despite memoisation, might cause the solution to have a big constant. The bottom-up approach has the same $O(N^2)$ time complexity, but a lower constant.

My bottom-up approach to the problem was as follows:

bottomUpLCS1.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
int dp[6100][6100];
int lcs()
{
	for(int i=n-1;i>=0;i--)
	{
		for(int j=m-1;j>=0;j--)
		{
			dp[i][j]=max(dp[i+1][j],dp[i][j+1]);
			if(a[i]==b[j])
				dp[i][j]=max(dp[i][j],1+dp[i+1][j+1]);
		}
	}
	return dp[0][0];
}

Note, the same problem is now done in the reverse fashion. The dp array needs to be initialized to be 0 only for all values of dp[n][0], and dp[0][m].

However, there is one small optimization that can be done here. We do not need a 2D array for dp, since for dp[i][j], it only needs the values of the dp[i+1] row, other rows are redundant.

bottomUpLCS2.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int dp[6100],prev[6100];
int lcs()
{
	for(int i=n-1;i>=0;i--)
	{
		for(int j=m-1;j>=0;j--)
		{
			dp[j]=max(prev[j],dp[j+1]);
			if(a[i]==b[j])
				dp[j]=max(dp[j],1+prev[j+1]);
		}
		memcpy(prev,dp,sizeof(int)*(m+1));
	}
	return dp[0];
}

The above solution reduces the problem to space complexity of $O(N)$. In practice, reducing the space complexity improves the actual running time, since the small array fits more easily in the cache.

Please refer the Dynamic Programming tutorial on TopCoder to learn more.

Comments