Last weekend I took part in the first London DataDive, a charitable event organised by DataKind, who previously organised similar events across the US. The basic premise is that charities have collected large amounts of data, on donors, fund-raising and the actual care, help or interventions they provide. Without costly analysts to sort through and make sense of the data, it goes unused, providing little or no value to the organisation.
Datakind wants to solve this problem by organising business consultants, data scientists and other analysts to provide pro bono services to the charities over the course of a weekend. The basic format is similar to a hackathon, with Friday night being spent networking, learning about the problems of the charities and picking one to work with. Saturday is spent working on the data to provide actionable results for the charities. These results are presented on Sunday morning along with any considerations or suggestions from the data scientists.
The three charities at the London event were Oxfam, Place2Be and Keyfund. Having been intrigued by Hannah of Keyfund’s speech on Friday night I opted to help them over the weekend. Keyfund work with young people to develop their skills and confidence through small projects which are conceived, planned and implemented by the young people themselves. Keyfund coordinates the assessment and funding of these projects through partnerships with local organisations across the country.
Over the weekend we analysed Keyfund’s data in a number of ways. In particular we considered the demographics of the children in the scheme, quantified the outcomes in terms of self assessments and skills profiles and assessed the likely effect of streamlining their process into fewer stages. Hopefully the results will be of use to Hannah and the Keyfund team in assessing their procedures and convincing funders to support this worthy cause.
On the technical side I took this opportunity to learn more about the Pandas library by Wes McKinney, which provides a structured data companion to Numpy’s more homogeneous arrays. The accompanying jargon is quite similar to R, with data frames and series in place of arrays and vectors. Some elements took a bit of getting used to, but one powerful feature is the deep connections with Matplotlib, allowing easy creation of histograms and box plots from data frames. I hope to look more into Pandas, having just bought Wes McKinney’s new book “Python for Data Analysis”.
I really enjoyed the first international Datadive and really appreciate the work that organisers Jake Porway and Craig Barowsky put in to make everything run smoothly. The atmosphere was great throughout the weekend, including late into the night on Saturday and the participation from everyone involved was inspiring. At a time when the gender imbalance in science and technology is making headlines, it was also great to see an event where this wasn’t an issue in the slightest. Overall I would heartily recommend to anyone involved in data to give something back to the communities you live in by participating in one of these events. Plans are under way for more events of this kind in London and I will be jumping at the chance to get involved again.
Update: Just noticed that Dirk Gorissen who was on my team has a nice writeup with some results (including one of my graphs).