Flexible conda package dependencies on Cloud Foundry

The official Python Cloud Foundry buildpack has support for conda environments using the environment.yml file. This provides a lot of flexibility for Python (and other) dependencies and helps you to use packages from other public and private sources including your own locally built ones.

Using other conda channels

Conda-forge

One thing this allows you to do is reference other channels such as conda-forge, and get packages from there instead of the standard Anaconda provided default channel.

We can specify conda-forge as a channel in our environment.yml file and then install a package like prettytable. This enables a lot of flexibility to access the latest and greatest conda packages from anywhere.

name: myenv
channels:
  - conda-forge
dependencies:
  - python=3.6
  - flask
  - gunicorn
  - prettytable

 

Vendoring conda packages with your app

However, what happens if you need a particular package but you can’t access the public internet from your CF installation? In the Python buildpack, the pip-based approach supports the concept of “vendoring”, providing dependency packages alongside your application code when uploading to CF.

You can do this with conda packages as well, by pointing to a local conda channel created in your app directory.

If you are packaging your own library, there are lots of tutorials on how to build a conda package for it, depending on the complexity. If you need an existing third-party library you can use any of the existing conda packages from anaconda.org or conda-forge.

The next step is to create a local conda channel in your app directory. This is essentially a series of directories for each architecture you need (osx_64, linux, win, and noarch for pure Python packages) which you tell conda about by creating a channel index for each architecture directory:

conda index ./vendor/noarch

Once you have this channel in place in your app, you need to tell the CF buildpack about it by providing a local file path to the channel.

During the staging phase of CF app deployment, the buildpack installs your app and all its dependencies in a temporary container and the absolute path is /tmp/app. So we can provide conda with a reference to our local channel, knowing it will be located at /tmp/app/vendor.

name: myenv
channels:
  - /tmp/app/vendor
dependencies:
  - python=3.6
  - flask
  - gunicorn
  - mypkg

When we cf push our local conda package file will be uploaded along with our app, and the Python buildpack will install it along with any other dependencies required. In this way you can mix local and public packages easily.

If you want to see this in action, I’ve created a very simple example package, and a corresponding CF app that installs this package from a local channel, as well as getting prettytables from conda-forge. For more details on using Python on Cloud Foundry, take a look at my tutorial.

 

Python on Cloud Foundry

I’m very happy to be giving a talk at the latest PyData conference in New York this weekend.

This is a long post but I wanted a place to collect all the code I am showing in my talk and to provide a few more resources for those interested in trying out Python on Cloud Foundry further.

Resources Cloud Foundry

What is Cloud Foundry?

My talk is about how to use Python and the PyData stack on Cloud Foundry the open source cloud platform. Cloud Foundry started life at VMware and development transferred to Pivotal when it was formed. Cloud Foundry has grown much bigger since then with over 30 companies joining together to form the Cloud Foundry Foundation which will guide the development of the open source project.
Continue reading…

 

DataDive toolbelt

logo-datakindWhat tools do you need to bring to a DataDive? The next DataKind UK DataDive is taking place in two weeks time in London. I took part in one of the previous DataDives and I would highly recommend the experience for anyone with data science or analytical skills who wants to help charities use their data.

The DataDives take place over the course of a weekend and in that time you have to decide on a charity to work with, understand their data and goals, perform your analysis and present your results in a usable form. That’s a lot to get through in just over two days so it’s very important to be able to get up and running quickly with the analysis. I thought it might be useful to list the software and tools that I will be packing in my DataDive toolbelt this time around. Continue reading…

 

Pyinspire – Python script to access INSPIRE database

The new INSPIRE HEP database has been up and running for a while now, and is going from strength to strength (despite some recent wobbles concerning citation counts).

I recently needed to get the BibTeX entries for a few papers and instead of copy-pasting from the results web page each time I wondered whether a more programmatic solution existed. There used to be a few utilities for SPIRES which enabled you to get results using a script but I haven’t seen any that do this with INSPIRE (although there is a plugin for Jabref).

I decided to cook up a quick script called pyinspire that will send INSPIRE a query and scrape the resulting page for results. It is available now in the Python Package Index with the source code on Bitbucket. I’ve released it under a modified BSD license so feel free to fork. Installation is as easy as pip install pyinspire or easy_install pyinspire.

The functionality is very basic at the moment but does include output in BibTeX, LaTeX(EU) and LaTeX(US) modes, and standard text output including citation counts. See the Bitbucket page for more details.

 
Bear