Towards Cloud Native Data Science

In my talk at ODSC West I wanted to start a conversation about what if any value the idea of Cloud Native Applications has for data science. The video and slides from my presentation are below and the slides are also available without speaker notes.

If you haven’t heard of Cloud Native Applications, the idea is to write applications that take full advantage of the benefits of cloud deployment and understand the limitations and constraints of the platform.

Some of these suggested practices have been collected into a list of 12 ‘factors’ which include making your app stateless, explicitly declaring dependencies and ensuring parity between development and production environments. My colleague at Pivotal gives a good introduction to the topic in this free O’Reilly book.

I want to figure out what additional factors are specific to data science and in my talk I identified an initial list of three:

  • Reproducibility of models,
  • Models exposed as services,
  • Explicit configuration of data pipelines.

I’d be really interested in hearing other ideas for the list and any other comments. You can contact me via Twitter or through the comments below. Update: The video of the talk is now up.

 

Ian

A physicist by training, I am curious about the world around us, from the smallest to the largest scales. I am now a part of the Pivotal Data Science team and work on interesting data science and predictive analytics projects across a wide range of industries. On Twitter I'm @ianhuston, and on Github I'm ihuston.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Bear