Gaurav's Blog

return rand();

Recursion Is Ubiquitous…

| Comments

… and so simple and elegant. In some sense, it is a piece of art, and one needs artists to appreciate it. I had this semi-non-trivial question in my mind:

Print all the permutations of a given string S.

I had solved this before, but all I could recollect was that I had probably made it a little too dirty that time. The thing is, even with non-trivial problems, elegant solutions are possible with simple application of tools like Recursion / Divide & Conquer.

Prof. Bender asked us to visualize all the algorithmic techniques we learnt in CSE548 as ‘tools in our toolbox’. Recursion / D&C is a very important tool IMHO.

How do we break this problem into a smaller problem?

Simple.

We only decide the character that will appear in the beginning of the permutation, and find all the permutations of the remaining string, and just stick the first character in front of each string.

When we have only one character left, the first character is the only permutation.

The code is as follows:

Permutations.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
string str;

void permute(int i)
{
  if(i == str.size()-1)
  {
    cout << str << endl;
    return;
  }

  permute(i+1);
  for(int j = i+1; j < str.size(); j++)
  {
    swap(str[i], str[j]);
    permute(i+1);
    swap(str[i], str[j]);
  }

}

void wrapper()
{
  str = "abcd";
  sort(str.begin(), str.end());
  permute(0);
}

Isn’t it beautiful? Its magical how we apparently do no work in every step, and it all stitches together so beautifully.

The Talking Stick Problem

| Comments

We have N (>2) nodes, each synchronized among themselves and each having a copy of a particular process running on them. Each process reads the incoming data on the network and generates the same output. We need the output to be written onto the network. However, we want only one node to be transmitting this output at a time. If there is a failure, another node should pick up, but only one node should transmit the output until it fails. Upon a failure, the operation should resume within f ms.

An easy approach is to keep an arbitrator, who expects a ping from the node with the talking stick every (f/4) ms or so. If it does not, it assigns the talking stick to some-other node and resume operation. Whenever a node comes back up, it has lost its talking stick. A glaring failure of this approach is its one point of failure. How do we solve this problem? We keep another node to make sure the arbitrator is alive. How do we know that the new node is alive? By keeping another node to see if the new node is alive all the time. And, Congratulations! We just landed ourselves an instance of the Von Neumann Catastrophe.

Let us look at this solution: We arrange the nodes in a ring. A is connected to B, B to C, C to D and so on. A is initially a node which begins talking. B knows A is the talking node, and expects a ping from A every f/2 ms or so. If A does not respond, B takes over the talking, and tells C to expect a ping from it in regular intervals. Looks better? Somewhat, but it ain’t correct. If B fails, and then A fails, C should take over, but C thinks B is definitely not the primary, probably its A who is talking.

A likely solution is this: Each node is assigned a unique numerical id, from 1 to N. _Each node knows its own id, and we are given that the clocks are already synchronized amongst the nodes. The talking node, apart from the data, can transmit a ping to other nodes that it is alive. When a node _i, which was the talking node, fails, all nodes realize this, since they stop receiving data from it on the network. Each node has a hard-coded timeout for i, after which they expect i+1 to take over. If i+1 is up and running it takes over and transmits on the network that it is alive and in-charge. If it is not, all nodes collectively time-out on i+1 and start expecting i+2 to take over now, and so on.

The above has a couple of shortcomings, firstly, the timeout for each node to take over might be small if N is large. If the new node is supposed to take over in f ms, the time-out for each node should not be more than f/N ms. I am not sure if there might be problems synchronizing a large cluster of machines to such an accuracy. Second, if all nodes fail between i and i+101, we wait for a 100 nodes even though they are dead.

I am still thinking of how to improve this further. Suggestions?

Lower Bound on Comparisons in a Comparison-based Sort

| Comments

I was asked this question recently. Here is a proof based on Information Theory.

Assume that we have to sort an array of n elements, all of which are distinct. Hence, there are $n! $ permutations possible, out of which only one is sorted.

Now let us assume the lower bound on the comparisons to be $f(n) $. Since each comparison has only 2 outcomes, thus,

$2^{f(n)} \geqslant n! $

has to hold.

Therefore,

$f(n) = \log _2 ( n!) $

which, by Stirling’s approximation is $\Omega(n \log_2 n)$. Knuth also gives a similar explanation in Volume 3.