Data Science & ‘Extreme Programming Explained’

A new event in the Pivotal London office this week was our first lunchtime book club meeting. The first book suggested for discussion was Kent Beck and Cynthia Andres’ Extreme Programming (XP) Explained (2nd Edition).

This is a classic of the agile programming community and Kent Beck’s shorter first edition (1999) can lay claim to being one of the first books about agile programming practices. In this long post I’m going to start discussing the message of the book, and how I think some of the ideas apply in the world of data science.

Continue reading…

 

Python on Cloud Foundry

I’m very happy to be giving a talk at the latest PyData conference in New York this weekend.

This is a long post but I wanted a place to collect all the code I am showing in my talk and to provide a few more resources for those interested in trying out Python on Cloud Foundry further.

Resources Cloud Foundry

What is Cloud Foundry?

My talk is about how to use Python and the PyData stack on Cloud Foundry the open source cloud platform. Cloud Foundry started life at VMware and development transferred to Pivotal when it was formed. Cloud Foundry has grown much bigger since then with over 30 companies joining together to form the Cloud Foundry Foundation which will guide the development of the open source project.
Continue reading…

 

DataDive toolbelt

logo-datakindWhat tools do you need to bring to a DataDive? The next DataKind UK DataDive is taking place in two weeks time in London. I took part in one of the previous DataDives and I would highly recommend the experience for anyone with data science or analytical skills who wants to help charities use their data.

The DataDives take place over the course of a weekend and in that time you have to decide on a charity to work with, understand their data and goals, perform your analysis and present your results in a usable form. That’s a lot to get through in just over two days so it’s very important to be able to get up and running quickly with the analysis. I thought it might be useful to list the software and tools that I will be packing in my DataDive toolbelt this time around. Continue reading…

 

Pivotal Webinar, DS Amsterdam and PyData London

It looks like this is going to be a busy week. First up Noelle Sio, Alexander Kagoshima and I are presenting a webinar on Tuesday about our traffic analysis and prediction work. We talked about this topic at Strata Santa Clara but this webinar will take an extended look at the data, the challenges and the technology we used. You can sign up on the Pivotal website and the recording will be available afterwards.

Data Science AmsterdamOn Wednesday I am heading to Amsterdam to talk at the second Data Science Amsterdam meetup. The organisers of this meetup are branching out from their highly successful Data Science London event which regularly has hundreds of data scientists on the waiting list for each meeting. My talk on Wednesday will be about how to do massively parallel processing using familiar Python and R packages using the procedural languages PL/Python and PL/R.

On the subject of PL/Python, the video of my talk at PyData London is now available. Thankfully the video was edited to remove five minutes of me banging on the keyboard as my laptop crashed half way through my talk! In Amsterdam I will be talking about PL/R as well as PL/Python so hopefully this time the laptop holds up.

And finally, the PyData London conference was such a success that Ian Ozsvald and other organisers are starting a regular PyData meetup in London. We’re very excited at Pivotal to be hosting the first event in our London office on June 3rd. The meetup page is now up and places are filling up fast.

 

 

PyData London 2014

London

View of London from Level 39, the venue for PyData in Canary Wharf

Last weekend the first European PyData event took place in London’s Canary Wharf.

Having been really impressed with the last conference in New York in November, I was really looking forward to having PyData closer to home.

With lots of great talks on subjects from Machine Learning to Pharmaceutical drug discovery, the weekend did not disappoint. Ian Ozsvald has written up a good description of all the different activities.

PyDataBelow I have included the materials from my talk on Massively Parallel Processing with Procedural Python. Due to an unfortunate laptop crash I didn’t get to go through all the slides, but some of the missing material was covered by my colleague Srivatsan Ramanujan and I in New York.

The IPython notebook I used to demonstrate some simple examples is available on Github and can also be viewed using nbviewer. The slides embedded here are also on Slideshare:

 

In addition I thought it would be useful to try to collect as many of the tweets from over the weekend as possible. These are available on Storify. There’s no guarantee I’ve found everything but hopefully there will be some value in having links to some of the slides and other materials people mentioned during their talks.

Update 26/04/2014:

The videos from the PyData London conference are now available including my talk below. With the success of the event a new monthly PyData London meetup has also now been started.

 
Bear