06 Nov 2012 - 2 minute read
I was quite impressed by the Visualizing Friendships post on the Facebook Engineering blog. So I decided to try out some data visualization myself. Of course, I am no longer an intern at Facebook (I interned there this summer. A post coming up soon), so I don’t have access to the millions of edges used in the map. So, I decided to do something similar for Twitter.
To plot anything I needed to find where my friends and followers were located. I used the Twitter API to find the list of my friends and followers. Then for each of the users, I found where they were located. This is not quite simple, since I want the location to be in the Latitude - Longitude format, and not everyone mentions their real locations in their Twitter profile.
The Twitter API had two basic problems:
But I waded through both of them, by (a) being patient (b) batching up requests in groups of size 100. This got me the public data about my friends whose profiles were publicly accessible. Now, a lot of them are living in places which might have multiple names, can be misspelled, do not accurately pinpoint the location of the person etc. For example, a friend who lives in Mumbai, can write ‘Mumbai’, ‘Bombay’ (the old name of Mumbai), ‘Aamchi Mumbai’ (a popular phrase, which translates to ‘Our Mumbai’), or ‘Maharashtra’ (the state), etc. Thankfully, I found the Yahoo! Placefinder API, which solves this problem more or less. We can query for the lat-long pair of a place, and it will return its best guesses for the same.
Once I did that, I could use R (thanks to Aditya for the suggestion) to plot the lat-long pairs on the World Map. The output isn’t quite pleasing to the eye, but it does the job.
You can find the script that gets the Lat-Long pairs here.
Edit: Since Yahoo! is commercializing their PlaceFinder API, I think the Google GeoCoding API should be a suitable replacement. (Thanks to Dhruv for the suggestion).
Read more06 Nov 2012 - less than 1 minute read
If one uses the terminal quite a lot, it is essential that one finds out fast ways of doing routine stuff. Aliases are one of the many tricks in the hat.
Here is how you can set up a bash alias:
alias smallcmd='very long and verbose command'
You can then type
smallcmd
and it will replace it by the longer version. To make this permanent, add the aliases to the end of your .bashrc file.
I have even aliased ‘git pull’, ‘git push’, ‘git commit’, and ‘git status’ to three letter mnemonics, which are really handy since I do a lot of work using git.
Secondly SSH aliases are something I didn’t know, until very recently. Instead of typing long unwieldy username and hostnames, one can simply add an alias for the machine in their SSH config file, usually in (~/.ssh/config). For example:
Host ubuntuvm
HostName 172.16.29.143
User reddragon
Now, I can just do
ssh ubuntuvm
to SSH to my Virtual Machine!
Read more16 Sep 2012 - 1 minute read
I was trying to implement the Miller-Rabin algorithm for one of my home-works, in Python. I had already done this in C++ to solve PON, but I wanted to try out Python, and see how fast I can get it to. I wrote a function to do $a\^b$ mod $n $ which used repeated squaring to get the result. However, the implementation wasn’t fast enough. Dhruv suggested that I profile the code.
Here is how you profile a python script:
python -m cProfile script.py
And this is what I came to know
172 function calls (66 primitive calls) in 31.492 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 __future__.py:48(<module>)
1 0.000 0.000 0.000 0.000 __future__.py:74(_Feature)
7 0.000 0.000 0.000 0.000 __future__.py:75(__init__)
1 0.020 0.020 0.020 0.020 hashlib.py:55(<module>)
6 0.000 0.000 0.000 0.000 hashlib.py:94(__get_openssl_constructor)
1 0.063 0.063 31.492 31.492 millerRabin.py:1(<module>)
108/2 16.644 0.154 16.644 8.322 millerRabin.py:12(fastModularExp)
2 0.008 0.004 16.672 8.336 millerRabin.py:21(millerRabin)
1 0.000 0.000 0.000 0.000 os.py:743(urandom)
Note how the fastModularExp()
function took a staggering 16 seconds. Now we came to know that the builtin pow function in Python already does fast modular exponentiation. And since it is a part of a C library, it is quite fast. After doing this, I submitted it as a solution for PON and got accepted with 0.35 seconds, which was faster than my 0.90 submission with C++ :-)