Gaurav's Blog

The Talking Stick Problem

We have $N$ (>2) nodes, each synchronized amongst themselves and each of them has an identical program running on them. The program reads the incoming data on the network and generates some output, which is sent on the network. However, we want only one node to be transmitting this output at a time. If there is a failure, another node should pick up, but only one node should transmit the output until it fails. Upon a failure, regular operation should resume within $f$ ms.

An naive approach is to keep an arbitrator, who expects a ping from the node with a `talking stick’ every ($f/4$) ms or so. If it does not, it assigns the talking stick to some other node and resume operation. Whenever a node comes back up, it has lost its talking stick. A glaring weakness of this approach is its single point of failure. How do we solve this problem? We keep another node to make sure the arbitrator is alive. How do we know that the new node is alive? By keeping another node to see if the new node is alive all the time. And so on. Clearly not a feasible solution.

Let us look at this solution: We arrange the nodes in a ring. A is connected to B, B to C, C to D and so on. A is initially a node which begins talking. B knows A is the talking node, and expects a ping from A every $f/2$ ms or so. If A does not respond, B takes over the talking, and tells C to expect a ping from it in regular intervals. Looks better? Somewhat, but it ain’t correct. If B fails, and then A fails, C should take over, but C thinks B is definitely not the primary, probably its A who is talking.

Another likely solution is this: Each node is assigned a unique numerical id, from 1 to $N$. Each node knows its own id, and we are given that the clocks are already synchronized amongst the nodes. The talking node, apart from the data, can transmit a ping to other nodes that it is alive. When a node $i$, which was the talking node, fails, all nodes realize this, since they stop receiving data from it on the network. Each node has a hard-coded timeout for $i$, after which they expect $i+1$ to take over. If $i+1$ is up and running it takes over and transmits on the network that it is alive and in-charge. If it is not, all nodes collectively time-out on $i+1$ and start expecting $i+2$ to take over now, and so on.

The above has a couple of shortcomings, firstly, the timeout for each node to take over might be small if $N$ is large. If the new node is supposed to take over in $f$ ms, the time-out for each node should not be more than $f/N$ ms. I am not sure if there might be problems synchronizing a large cluster of machines to such an accuracy. Second, if all nodes fail between $i$ and $i+101$, we wait for a 100 nodes even though they are dead.

I am still thinking of how to improve this further. Suggestions?

Read more

Lower bound on comparisons in a comparison-based sort

I was asked this question recently. Here is a proof based on Information Theory.

Assume that we have to sort an array of n elements, all of which are distinct. Hence, there are $n! $ permutations possible, out of which only one is sorted.

Now let us assume the lower bound on the comparisons to be $f(n) $. Since each comparison has only 2 outcomes, thus,

$2^{f(n)} \geqslant n! $

has to hold.

Therefore,

$f(n) = \log _2 ( n!) $

which, by Stirling’s approximation is $\Omega(n \log_2 n)$. Knuth also gives a similar explanation in Volume 3.

Read more

Compendium of Nice Algorithmic Observations

Often we come across nice observations. I am going to write some of them here as and when I come across them.

  • A directed graph is acyclic if it has a valid topological sort. (By Dhruv)

  • The longest path can be found in a graph can be found by negating the weights of the edges, and then the length of the longest path is the absolute value of the shortest path in the new graph.

  • The shortest (and the longest) path in a Directed Acyclic Graph (DAG) can be found in linear time. (Hint: Topological Sort, Dynamic Programming)

  • The longest path in a tree is the sum of the length of the two longest depth first searches from the root.

Read more