Pivotal Webinar, DS Amsterdam and PyData London

It looks like this is going to be a busy week. First up Noelle Sio, Alexander Kagoshima and I are presenting a webinar on Tuesday about our traffic analysis and prediction work. We talked about this topic at Strata Santa Clara but this webinar will take an extended look at the data, the challenges and the technology we used. You can sign up on the Pivotal website and the recording will be available afterwards.

Data Science AmsterdamOn Wednesday I am heading to Amsterdam to talk at the second Data Science Amsterdam meetup. The organisers of this meetup are branching out from their highly successful Data Science London event which regularly has hundreds of data scientists on the waiting list for each meeting. My talk on Wednesday will be about how to do massively parallel processing using familiar Python and R packages using the procedural languages PL/Python and PL/R.

On the subject of PL/Python, the video of my talk at PyData London is now available. Thankfully the video was edited to remove five minutes of me banging on the keyboard as my laptop crashed half way through my talk! In Amsterdam I will be talking about PL/R as well as PL/Python so hopefully this time the laptop holds up.

And finally, the PyData London conference was such a success that Ian Ozsvald and other organisers are starting a regular PyData meetup in London. We’re very excited at Pivotal to be hosting the first event in our London office on June 3rd. The meetup page is now up and places are filling up fast.



PyData London 2014


View of London from Level 39, the venue for PyData in Canary Wharf

Last weekend the first European PyData event took place in London’s Canary Wharf.

Having been really impressed with the last conference in New York in November, I was really looking forward to having PyData closer to home.

With lots of great talks on subjects from Machine Learning to Pharmaceutical drug discovery, the weekend did not disappoint. Ian Ozsvald has written up a good description of all the different activities.

PyDataBelow I have included the materials from my talk on Massively Parallel Processing with Procedural Python. Due to an unfortunate laptop crash I didn’t get to go through all the slides, but some of the missing material was covered by my colleague Srivatsan Ramanujan and I in New York.

The IPython notebook I used to demonstrate some simple examples is available on Github and can also be viewed using nbviewer. The slides embedded here are also on Slideshare:


In addition I thought it would be useful to try to collect as many of the tweets from over the weekend as possible. These are available on Storify. There’s no guarantee I’ve found everything but hopefully there will be some value in having links to some of the slides and other materials people mentioned during their talks.

Update 26/04/2014:

The videos from the PyData London conference are now available including my talk below. With the success of the event a new monthly PyData London meetup has also now been started.


PyData: From New York to London

I have been using the Python data ecosystem (consisting of NumPy, Matplotlib, Pandas, and many more) for a few years now, so I was really glad to be able to attend the conference dedicated to all things Python data related, PyData, in its latest incarnation in New York last November.

PyDataPyData has been running roughly three times a year since 2012 when the first event was held in the Google Campus in Mountain View. Having not been to any of the previous events I didn’t quite know what to expect from a conference that is quite specific in its scope, unlike say Strata or Pycon which cater to the huge constituencies of data analysis and Python respectively.

With my colleague Srivatsan Ramanujan, I submitted an abstract for a talk and we were really happy to get a slot on Sunday morning. We even managed to get Pivotal to become involved as a sponsor. We talked about how we use the Pydata stack in our data science work at Pivotal, including using procedural PL/Python in a massively parallel way using the Greenplum database. The slides for our talk are embedded below, and the accompanying video is also available (as are all the other PyData talks).

The atmosphere throughout the entire weekend was great, with a real focus on tools and techniques, and not the sales and marketing overkill of some other conferences. It was good to meet so many people involved in creating and maintaining the tools I use on a daily basis, if only to be able to buy them a beer as thanks for their hard work. The consensus between my colleague and I at the end of the weekend was that it would have been well worth going in a personal capacity, even if our employer hadn’t funded the trip (something you can’t say about many conferences).

It has just been announced that the first PyData event of 2014 is going to be in Canary Wharf in London. I would recommend anyone who has an interest in the Python ecosystem for data analysis to attend. The tutorials on the first day of the conference are a particularly good way to get up to speed on a topic whether it’s using IPython notebook, running simulations in PyMC or creating beautiful graphs with Python and D3. I’m hoping work commitments will enable me to be there, so say hello if you see me there.