CF Summit Talk on Data Science

A few weeks ago I was able to attend this year’s Cloud Foundry Summit, the conference for all things Cloud Foundry in Santa Clara, California. The event was a great success with lots of interesting talks from developers, operators and customers of CF, and the organisation of the conference itself was splendid, including one of the best exhibition halls I’ve seen, complete with ping pong tables, arcade games and delicious food.
Continue reading…


Pivotal Webinar, DS Amsterdam and PyData London

It looks like this is going to be a busy week. First up Noelle Sio, Alexander Kagoshima and I are presenting a webinar on Tuesday about our traffic analysis and prediction work. We talked about this topic at Strata Santa Clara but this webinar will take an extended look at the data, the challenges and the technology we used. You can sign up on the Pivotal website and the recording will be available afterwards.

Data Science AmsterdamOn Wednesday I am heading to Amsterdam to talk at the second Data Science Amsterdam meetup. The organisers of this meetup are branching out from their highly successful Data Science London event which regularly has hundreds of data scientists on the waiting list for each meeting. My talk on Wednesday will be about how to do massively parallel processing using familiar Python and R packages using the procedural languages PL/Python and PL/R.

On the subject of PL/Python, the video of my talk at PyData London is now available. Thankfully the video was edited to remove five minutes of me banging on the keyboard as my laptop crashed half way through my talk! In Amsterdam I will be talking about PL/R as well as PL/Python so hopefully this time the laptop holds up.

And finally, the PyData London conference was such a success that Ian Ozsvald and other organisers are starting a regular PyData meetup in London. We’re very excited at Pivotal to be hosting the first event in our London office on June 3rd. The meetup page is now up and places are filling up fast.



PyData London 2014


View of London from Level 39, the venue for PyData in Canary Wharf

Last weekend the first European PyData event took place in London’s Canary Wharf.

Having been really impressed with the last conference in New York in November, I was really looking forward to having PyData closer to home.

With lots of great talks on subjects from Machine Learning to Pharmaceutical drug discovery, the weekend did not disappoint. Ian Ozsvald has written up a good description of all the different activities.

PyDataBelow I have included the materials from my talk on Massively Parallel Processing with Procedural Python. Due to an unfortunate laptop crash I didn’t get to go through all the slides, but some of the missing material was covered by my colleague Srivatsan Ramanujan and I in New York.

The IPython notebook I used to demonstrate some simple examples is available on Github and can also be viewed using nbviewer. The slides embedded here are also on Slideshare:


In addition I thought it would be useful to try to collect as many of the tweets from over the weekend as possible. These are available on Storify. There’s no guarantee I’ve found everything but hopefully there will be some value in having links to some of the slides and other materials people mentioned during their talks.

Update 26/04/2014:

The videos from the PyData London conference are now available including my talk below. With the success of the event a new monthly PyData London meetup has also now been started.


How to Beat the Traffic (at Strata)

This week I had the opportunity to attend and speak at one of the biggest Big Data conferences of the year.

The Strata conferences run by O’Reilly have been running for the last few years and in many ways have driven the awareness and adoption of data science and predictive analytics.

My colleagues Alexander Kagoshima and Noelle Sio, and I talked about recent work we’ve been doing on how to use machine learning techniques to understand traffic flows in major cities and predict when travel disruptions will end. The talk seemed to be well received and generated a lot of questions and comments both at the conference and on Twitter. This recent post on the Pivotal blog explains more about the projects and the overall goals.

As part of the disruption prediction work I built a simple web app which displays the predictions for currently active incidents.

Video of the talk will be available through O’Reilly, and our slides are available on Slideshare:

If you are interested in this or other projects the Pivotal Data Labs team have worked on, there is a lot more information on the official Pivotal site.


Beat the Traffic at Strata 2014

Strata Conference 2014The next few weeks are going to be busy and one of the reasons is that I am fortunate enough to be speaking at this year’s Santa Clara edition of Strata.

Alexander Kagoshima, Noelle Sio and I are talking in the Machine Data session on Thursday 13th February about “Driving the Future of Smart Cities – How to Beat the Traffic“. It’s the last parallel talk of the day, so perfect timing for figuring out how to navigate the Bay Area traffic on the way home.

We’ll be looking at how in car data sources like GPS locations can enable more intelligent routing which predicts future traffic conditions along your journey.

In addition we’ve taken a look at traffic disruption data in London and created a model which predicts how long a new incident will last, giving you confidence that the collision which blocked your route to work this morning will have been cleared by the time you want to head home. I’ve written a simple web based demo which I hope to show during the talk.

Strata talks are videoed (yikes!) and we hope to make our slides available after the talk. Stay tuned as well for a sneak peek at the transport disruption demo.