Gaurav's Blog

return rand();

Dealing With Arbitrary Precision Floating Point Operations

| Comments

Another interesting problem I solved recently on SPOJ was EVAL (Digits of e). It is a challenge problem, and it involved printing the largest number of digits of e as possible (upto 1 million digits), within the time-limit of 25 seconds. (SPOJ processors are single core 733 MHz Pentium III processors AFAIK).

As far as I know, there is no way to actually find the digits, than using one of the series used for finding the value of e. One of the series is, Taylor’s series.

e(x) = 1 + (x/1!) + (x^2/2!) + (x^3/3!) + ..

Thus, the value of e, can be calculated by substituting x=1. However, to obtain the value accurately upto 1 million digits is no small task. It requires arbitrary floating point ops.

Let me paint the picture better. The M_PI constant in C++ is correct upto 17 digits. Using Taylor’s series and the long double datatype, you can get to only 19 digits.

I resorted to Python, and the its in-built bignum library using the Decimal module. Using the Brother’s series expansion, the best I could get was a little more than 1000 digits, when using psyco.

e.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import psyco
psyco.full()
from decimal import *
arr = {}
for i in range(1,240):
	arr[i]=2*i
getcontext().prec=1100
e = Decimal(2)
f = Decimal(1)
for i in range(1,240):
	a2=arr[i]
	f/=(a2)*(a2+1)
	e+=(f)*(a2+2);
print e

Then, I switched to Ruby, and used the BigDecimal module to get nearly 30k digits in time.

e.rb
1
2
3
4
5
6
7
8
9
10
11
require 'bigdecimal'
require 'bigdecimal/math'
include BigMath
e = BigDecimal('1.0')
t = BigDecimal('1.0')
for i in 1..8400
  t=t.div(i,32000)
  e+=t
end
s = e.to_s
print "2."+s[3..-1]

However, I think I can do much better in C++ if i have a good arbitrary-precision floating point operations library.

Any suggestions and pointers to better solutions are invited.

Longest Common Subsequence

| Comments

I was recently solving Aibophobia on the popular algorithm problem online judge, SPOJ.

The problem’s solution involves finding the Longest Common Subsequence of two strings.

A trivial solution to the problem is exponential in complexity. However, Dynamic Programming can be used to reduce the problem to time complexity $O(N^2)$.

There are usually two approaches to coding a Dynamic Programming solution. The first, is my favorite and very intuitive approach of Recursion with Memoisation, also known as the Top-down approach.

topDownLCS.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
int dp[6100][6100];
int lcs(int i, int j)
{
	int &res=dp[i][j];
	if(res!=-1) return res;
	if(i==n) return res=0;
	if(j==m) return res=0;
	
	res=max(lcs(i+1,j),lcs(i,j+1));
	if(a[i]==b[j])
		return res=max(1+lcs(i+1,j+1),res);
	return res;
}

Note that, here n, m are lengths of strings a and b, respectively and the dp array is initialized to -1. lcs(i,j) gives the value of the Longest Common Subsequence of strings a and b, starting from the i-th index of a and j-th index of b. Note that in lines 9-11 implement how the answer will be calculated recursively, and 6-7 define when the recursion will be terminated. Why is this fast? Because we memoize the calculated values in the dp array and if lcs(i,j) is called again, the cached value would be returned. The LCS of two strings would be returned by the call lcs(0,0).

Another approach is the bottom-up approach, which is easy to understand once you code the top-down approach. The problem with the previous top-down approach is that, in cases of large inputs, recursive calls, despite memoisation, might cause the solution to have a big constant. The bottom-up approach has the same $O(N^2)$ time complexity, but a lower constant.

My bottom-up approach to the problem was as follows:

bottomUpLCS1.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
int dp[6100][6100];
int lcs()
{
	for(int i=n-1;i>=0;i--)
	{
		for(int j=m-1;j>=0;j--)
		{
			dp[i][j]=max(dp[i+1][j],dp[i][j+1]);
			if(a[i]==b[j])
				dp[i][j]=max(dp[i][j],1+dp[i+1][j+1]);
		}
	}
	return dp[0][0];
}

Note, the same problem is now done in the reverse fashion. The dp array needs to be initialized to be 0 only for all values of dp[n][0], and dp[0][m].

However, there is one small optimization that can be done here. We do not need a 2D array for dp, since for dp[i][j], it only needs the values of the dp[i+1] row, other rows are redundant.

bottomUpLCS2.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int dp[6100],prev[6100];
int lcs()
{
	for(int i=n-1;i>=0;i--)
	{
		for(int j=m-1;j>=0;j--)
		{
			dp[j]=max(prev[j],dp[j+1]);
			if(a[i]==b[j])
				dp[j]=max(dp[j],1+prev[j+1]);
		}
		memcpy(prev,dp,sizeof(int)*(m+1));
	}
	return dp[0];
}

The above solution reduces the problem to space complexity of $O(N)$. In practice, reducing the space complexity improves the actual running time, since the small array fits more easily in the cache.

Please refer the Dynamic Programming tutorial on TopCoder to learn more.