Gaurav's Blog

return rand();

Putting My Twitter Friends and Followers on the Map

| Comments

I was quite impressed by the Visualizing Friendships post on the Facebook Engineering blog. So I decided to try out some data visualization myself. Of course, I am no longer an intern at Facebook (I interned there this summer. A post coming up soon), so I don’t have access to the millions of edges used in the map. So, I decided to do something similar for Twitter.

To plot anything I needed to find where my friends and followers were located. I used the Twitter API to find the list of my friends and followers. Then for each of the users, I found where they were located. This is not quite simple, since I want the location to be in the Latitude - Longitude format, and not everyone mentions their real locations in their Twitter profile.

The Twitter API had two basic problems:

  1. It is slow
  2. It places a lot of tight restrictions on how many results you can get at once.

But I waded through both of them, by (a) being patient (b) batching up requests in groups of size 100. This got me the public data about my friends whose profiles were publicly accessible. Now, a lot of them are living in places which might have multiple names, can be misspelled, do not accurately pinpoint the location of the person etc. For example, a friend who lives in Mumbai, can write ‘Mumbai’, ‘Bombay’ (the old name of Mumbai), ‘Aamchi Mumbai’ (a popular phrase, which translates to ‘Our Mumbai’), or ‘Maharashtra’ (the state), etc. Thankfully, I found the Yahoo! Placefinder API, which solves this problem more or less. We can query for the lat-long pair of a place, and it will return its best guesses for the same.

Once I did that, I could use R (thanks to Aditya for the suggestion) to plot the lat-long pairs on the World Map. The output isn’t quite pleasing to the eye, but it does the job.

You can find the script that gets the Lat-Long pairs here.

Edit: Since Yahoo! is commercializing their PlaceFinder API, I think the Google GeoCoding API should be a suitable replacement. (Thanks to Dhruv for the suggestion).

Comments