Towards Cloud Native Data Science

By Ian

November 17, 2015

In my talk at ODSC West I wanted to start a conversation about what if any value the idea of Cloud Native Applications has for data science. The video and slides from my presentation are below and the slides are also available without speaker notes.

If you haven’t heard of Cloud Native Applications, the idea is to write applications that take full advantage of the benefits of cloud deployment and understand the limitations and constraints of the platform.

Some of these suggested practices have been collected into a list of 12 ‘factors’ which include making your app stateless, explicitly declaring dependencies and ensuring parity between development and production environments. My colleague at Pivotal gives a good introduction to the topic in this free O’Reilly book.

I want to figure out what additional factors are specific to data science and in my talk I identified an initial list of three:

Reproducibility of models,
Models exposed as services,
Explicit configuration of data pipelines.

I’d be really interested in hearing other ideas for the list and any other comments. You can contact me via Twitter or through the comments below. Update: The video of the talk is now up.