Saturday 10 March 2012

PyCon - Day 2 Tutorials (Social Networks & High Performance Python II)

I had two tutorials on today, the first on "Social Network Analysis", and the second on "Advanced Python II".

Social Network Analysis is a fairly loose term that groups a whole bunch of ideas together. A Social Network is something where you have information about which relationships exist between individuals, and what those properties are. From a computer science perspective that lends itself to a graph-theoretic approach to analysing the social networks. The second stage is then the use of statistics to make some conclusions about data (identify key members in the network, identify anomalous behaviour, compare networks and lots of other things).

The talk happened in two interleaved parts, a live coding demo of how to retrieve data from online social networking services (twitter & crunchbase), and a slightly more theoretical discussion of the principles of network analysis.

I think that perhaps this tutorial was targeted with slightly different audience in mind to what I was expecting. I would have been happy with a pretty rigorous mathematical description of the statistics, and how to deal with that in python, but considering the reactions, I think that other people may have had a opinion.

The second talk on High Performance Python blew me away completely. The thought of getting to hear Travis Oliphant (previously at Enthought, and now at Continuum Analytics), speak about numpy was a big deciding factor for coming, and he (and his team) didn't disappoint.

The talk was actually split into four parts, an introduction on how to write efficient numpy code (vectorisation, making best use of the numpy data structures), a second section on numexpr (a tool for optimising small but critical numpy expressions), a third section that was a impressive live demo of a particle simulation, that was used to demonstrate the measurement and prediction of performance of numpy code, and a fourth section describing a new tool for writing optimized numpy ufuncs called numba (which comes from numpy + mumba), based on LLVM.

After the tutorials finished, I ran into Ned Batchelder in the lobby. I'm a big fan of Ned's coverage.py and cog modules, but also because of his blog. After that, I picked a random table and started talking to people, and it just so happened that they were the Sourceforge development team - who very kindly took me out to a lovely dinner at an Indian restaurant.

Update

One of the libraries mentioned in the social media tutorial was tweepy, a library for pulling data from twitter

The other was an improved download library to replace the (somewhat) broken urlib2 called requests

There was some discussion in the Adv. Python II tutorial about combining numexpr with the Intel Math Kernel Library.

I was flipping through some of Enthoughts products looking for the Continuum Analytics github repository and came across Chaco, which looks to be an alternative 2d plotting library.

No comments:

Post a Comment