In addition I thought it would be useful to try to collect as many of the tweets from over the weekend as possible. These are available on Storify. There’s no guarantee I’ve found everything but hopefully there will be some value in having links to some of the slides and other materials people mentioned during their talks.
This week I had the opportunity to attend and speak at one of the biggest Big Data conferences of the year.
The Strata conferences run by O’Reilly have been running for the last few years and in many ways have driven the awareness and adoption of data science and predictive analytics.
My colleagues Alexander Kagoshima and Noelle Sio, and I talked about recent work we’ve been doing on how to use machine learning techniques to understand traffic flows in major cities and predict when travel disruptions will end. The talk seemed to be well received and generated a lot of questions and comments both at the conference and on Twitter. This recent post on the Pivotal blog explains more about the projects and the overall goals.
As part of the disruption prediction work I built a simple web app which displays the predictions for currently active incidents.
Video of the talk will be available through O’Reilly, and our slides are available on Slideshare:
If you are interested in this or other projects the Pivotal Data Labs team have worked on, there is a lot more information on the official Pivotal site.
We’ll be looking at how in car data sources like GPS locations can enable more intelligent routing which predicts future traffic conditions along your journey.
In addition we’ve taken a look at traffic disruption data in London and created a model which predicts how long a new incident will last, giving you confidence that the collision which blocked your route to work this morning will have been cleared by the time you want to head home. I’ve written a simple web based demo which I hope to show during the talk.
Strata talks are videoed (yikes!) and we hope to make our slides available after the talk. Stay tuned as well for a sneak peek at the transport disruption demo.
I have been using the Python data ecosystem (consisting of NumPy, Matplotlib, Pandas, and many more) for a few years now, so I was really glad to be able to attend the conference dedicated to all things Python data related, PyData, in its latest incarnation in New York last November.
PyData has been running roughly three times a year since 2012 when the first event was held in the Google Campus in Mountain View. Having not been to any of the previous events I didn’t quite know what to expect from a conference that is quite specific in its scope, unlike say Strata or Pycon which cater to the huge constituencies of data analysis and Python respectively.
With my colleague Srivatsan Ramanujan, I submitted an abstract for a talk and we were really happy to get a slot on Sunday morning. We even managed to get Pivotal to become involved as a sponsor. We talked about how we use the Pydata stack in our data science work at Pivotal, including using procedural PL/Python in a massively parallel way using the Greenplum database. The slides for our talk are embedded below, and the accompanying video is also available (as are all the other PyData talks).
The atmosphere throughout the entire weekend was great, with a real focus on tools and techniques, and not the sales and marketing overkill of some other conferences. It was good to meet so many people involved in creating and maintaining the tools I use on a daily basis, if only to be able to buy them a beer as thanks for their hard work. The consensus between my colleague and I at the end of the weekend was that it would have been well worth going in a personal capacity, even if our employer hadn’t funded the trip (something you can’t say about many conferences).
It has just been announced that the first PyData event of 2014 is going to be in Canary Wharf in London. I would recommend anyone who has an interest in the Python ecosystem for data analysis to attend. The tutorials on the first day of the conference are a particularly good way to get up to speed on a topic whether it’s using IPython notebook, running simulations in PyMC or creating beautiful graphs with Python and D3. I’m hoping work commitments will enable me to be there, so say hello if you see me there.
I thought now would be a good time to reflect on 2013 and the changes that have happened in my life.
Officially I have now been out of academia for a year. My postdoc contract ended at the end of 2012, and while I stayed on at QMUL as a visiting researcher, I took the opportunity to travel for a few months with my wife, before setting out to find a new job.
There was no one day that I woke up not wanting to continue in academia, but as my postdoc continued I started to consider what other options I would have instead of just another postdoc. A big help in this regard was my university’s researcher specific career advisor. In particular they organised an event where former postdocs came back to QMUL to describe how they found moving out of academia. All the participants were really honest about their hopes and fears during the transition which I found refreshing compared to the polarised “it will be awful/great” that one often hears. There has been a lot of talk on Twitter recently about how to prepare grad students for life beyond academia and I definitely think these kind of events bring in voices with experience of working outside academia.
Even mainstream publications are now aware that we are in the era of Big Data and that a new role of data scientist has appeared. A data scientist is some part programmer, researcher, statistician and domain expert, best illustrated with a Venn diagram. For me the combination of research, programming and mathematics seemed like a really good fit, but I knew I wouldn’t have all the necessary skills straight from my postdoc.
After some searching I joined Pivotal, a new company formed out of parts of EMC and VMware earlier this year to form a coherent strategy around the big data assets of those companies. I’m now part of the Pivotal Data Science team. We help our customers to make the best use of their data to solve specific business problems. It’s a great group of people with varied backgrounds and lots of experience dealing with real world (very!) big data problems (we’re hiring by the way).
I joined about six months now and it’s been a great experience so far. The team are really fun to work with and I’ve learnt a lot about both data science and business. It looks like 2014 is going to be very busy in a good way.
A physicist by training, I am curious about the world around us, from the smallest to the largest scales. I recently joined the Pivotal Data Science team and now work on interesting data science and predictive analytics projects for a wide range of industries.
As a university researcher I created numerical simulations of cosmological perturbations during the early universe. My code, called Pyflation, is open source and available for download.
This is a personal site and the views and opinions expressed in these pages are strictly mine and have not been reviewed or approved by my employer.