In an effort to actually have some projects to demonstrate for job interviews, I got a copy of Mining the Social Web by Matthew A. Russell. Now, this didn’t got well at first – the part of the API for the first coding exercise 1-3 has moved servers, and due to Twitter’s integration/use of Yahoo! Where On Earth features, getting ahold of the trends information from twitter is a little more complicated. However, this is covered in the errata sections here: http://oreilly.com/catalog/errata.csp?isbn=0636920010203
O’Reilly and the author have done a good job to keep this up to date. If you’re going to get the book (and so far, I can recommend it), keep this in mind.
So what have I been up to – principally this script: retweet_visualization.py
This Python script is an adapted/annotated version of the one that was supplied by the book’s GitHub resource. Essentially, it makes a query to twitter (here, 15 pages of 100 tweets), and uses the NetworkX library to construct a graph from this, which is then parsed into a .dot file for processing. It searches for ‘via’ and ‘RT’ in the posts, and then maps who is retweeting, and who has been retweeted (also stripping off the @ from the retweeted name).
In order to run the script, you shall first have to install NetworkX and the Twitter API’s for Python:
$ easy_install networkx
$ easy_install twitter
Once these are in, then you are ready to go! To run this script, crack open a terminal window (I’m using Linux, but if you’ve set it up for Windows/Mac then there should be no problems with this script) and type the following command:
$ python retweet_visualization.py [search term]
So, to search for all tweets containing ‘Mathsjam’ (which includes @MathsJam and #mathsjam), the command was:
$ python retweet_visualization.py mathsjam
To interpret and make a digraph from the DOT file, I installed Graphviz ( http://www.graphviz.org/ ), upon which, once the script has generated the .dot file, this command will create the graphic:
dot -Tpng ./out/twitter_retweet_visualization.html.dot -o twitter_retweet.png
And here is the output that you can expect:
Overall, a good few hours work! Next idea for this project is to find a better metric for relating people in a hierarchy using their tweets. Tweets are very handy – very small, easy to process and retrieve, and there’s plenty of them on near-enough *all* subjects!
Interesting note for MathsJammers – @ColinTheMathmo @stecks are not at all on this list, whereas @standupmaths and @Mathsjam are clearly retweeted a lot. This I found quite interesting, as it is not a good enough metric to assess a hierarchy along.
Secondary results: Here is a script that uses the NLTK library (run ‘easy_install nltk’ before attempting this script) to do some very basic processing on a search for tweets with a particular term in them.
Here is the file: twitter_search.py
$ python twitter_search.py [search term]
Comments below or via twitter: @LargeCardinal