Advent of Code 2017 in Kotlin

I talk about Python a lot here but at work we also write quite a bit of Java (with Spring of course). Since it hit 1.0 in 2016, I’ve been hearing more and more about Kotlin as a viable alternative to standard Java on the JVM.

KotlinAt Pivotal we’re seeing clients start to use Kotlin because of its brevity compared to Java, the ease of use of existing Java based libraries, the adoption by Google for Android, and the (relatively new) ability to also target Javascript.

To find out what all the fuss is about, I decided to try out Kotlin for last month’s Advent of Code set of programming puzzles.
Advent of Code 2017

True to its reputation, it was easy to get a project started in Kotlin (using IntelliJ) and familiar tools like JUnit for testing are available without any trouble.

The puzzles in Advent of Code are of a particular type, mainly computational tasks with some algorithmic tricks often necessary for the more difficult second half of each challenge. So this was really a test of how well I could use Kotlin for these kinds of problems, rather than, say, doing a lot of data cleaning or creating an Android app. The well written problem descriptions lend themselves very well to a TDD approach as they almost always include examples that capture the complexity of the problem.

You can take a look at my solutions for the first 21 days of problems on Github. In the run up to Christmas, family preparations and celebrations took precedence over coding puzzles, but I’ll update the repo if I get time to complete the last few days.

It’s early days for my use of Kotlin, but so far I’ve found the language to be a nice mix of my favourite parts of Python (simplicity and flexibility) and Java (type safety and language specific tooling). Recently there has even been a movement to build a community of data scientists using Kotlin, and Thomas Nield’s videos are a good way to get started in this area.

 

Flexible conda package dependencies on Cloud Foundry

The official Python Cloud Foundry buildpack has support for conda environments using the environment.yml file. This provides a lot of flexibility for Python (and other) dependencies and helps you to use packages from other public and private sources including your own locally built ones.

Using other conda channels

Conda-forge

One thing this allows you to do is reference other channels such as conda-forge, and get packages from there instead of the standard Anaconda provided default channel.

We can specify conda-forge as a channel in our environment.yml file and then install a package like prettytable. This enables a lot of flexibility to access the latest and greatest conda packages from anywhere.

name: myenv
channels:
  - conda-forge
dependencies:
  - python=3.6
  - flask
  - gunicorn
  - prettytable

 

Vendoring conda packages with your app

However, what happens if you need a particular package but you can’t access the public internet from your CF installation? In the Python buildpack, the pip-based approach supports the concept of “vendoring”, providing dependency packages alongside your application code when uploading to CF.

You can do this with conda packages as well, by pointing to a local conda channel created in your app directory.

If you are packaging your own library, there are lots of tutorials on how to build a conda package for it, depending on the complexity. If you need an existing third-party library you can use any of the existing conda packages from anaconda.org or conda-forge.

The next step is to create a local conda channel in your app directory. This is essentially a series of directories for each architecture you need (osx_64, linux, win, and noarch for pure Python packages) which you tell conda about by creating a channel index for each architecture directory:

conda index ./vendor/noarch

Once you have this channel in place in your app, you need to tell the CF buildpack about it by providing a local file path to the channel.

During the staging phase of CF app deployment, the buildpack installs your app and all its dependencies in a temporary container and the absolute path is /tmp/app. So we can provide conda with a reference to our local channel, knowing it will be located at /tmp/app/vendor.

name: myenv
channels:
  - /tmp/app/vendor
dependencies:
  - python=3.6
  - flask
  - gunicorn
  - mypkg

When we cf push our local conda package file will be uploaded along with our app, and the Python buildpack will install it along with any other dependencies required. In this way you can mix local and public packages easily.

If you want to see this in action, I’ve created a very simple example package, and a corresponding CF app that installs this package from a local channel, as well as getting prettytables from conda-forge. For more details on using Python on Cloud Foundry, take a look at my tutorial.

 

Bringing a Python Django app to Cloud Foundry in 2017

In this post I want to answer the question:
What do you have to do to run a Django web app on Cloud Foundry in 2017?

In the past, a few other people have described their approaches, but given that Cloud Foundry is continuously changing and improving, I thought it would be good to revisit the topic and learn about Python & Django support in 2017.

Python on Cloud Foundry

Cloud Foundry is a polyglot application deployment system. At Pivotal [disclosure: where I work, but not on the Cloud Foundry team], we put a lot of emphasis on how great a home Cloud Foundry is for Java Spring applications, and we’ve always been fond of Ruby on Rails.

That doesn’t mean other languages are hard to run on CF though. Following the example of Heroku, CF uses ‘buildpacks’ to provide official support for many languages, and community support for many more.

Python is an officially supported language for CF, and the official buildpack is maintained and updated by the buildpacks team. This gives me confidence that I can rely on the Python buildpack to have up-to-date interpreters and saves me the hassle of finding or creating a custom buildpack.

Pre-requisites

I’ve been going through the updated 2nd edition of the ‘Obey the Testing Goat book’ otherwise known as Test Driven Development With Python by Harry J.W. Percival.
Test Driven Development with Python

In the book you build up a Django application from scratch using a TDD approach. I’m going to deploy this ‘Superlists’ to-do list application on to Cloud Foundry.

If you want to follow along you should have completed all the exercises up to and including Chapter 10, which includes adding gunicorn to requirements.txt. You can take a look at my version of the app at this point.

If you want to skip ahead and see all the changes we’ll make to the app, have a look at this commit.

First we’ll start as always by checking that our functional tests run successfully on our local machine.

$ python manage.py test functional_tests

All green, so we’re good to go!

Getting ready for Cloud Foundry

We are going to push our application to Cloud Foundry which will create a domain name for us. Let’s use the STAGING_SERVER variable to test this. I am aiming for the domain ih-superlists.cfapps.io, yours will vary based on your Cloud Foundry provider.

$ STAGING_SERVER=ih-superlists.cfapps.io python manage.py test functional_tests

As expected the tests fail completely.

Let’s get started on deploying to Cloud Foundry. We need to provide a ‘manifest’ file which tells Cloud Foundry how to deploy our application.

manifest.yml

---
applications:
- name: ih-superlists
  memory: 512M
  instances: 1
  buildpack: python_buildpack
  command: gunicorn superlists.wsgi:application

Then you can try to deploy using $ cf push and look at the logs with $ cf logs ih-superlists.

If your CF setup is like mine you’ll see

... [APP/PROC/WEB/0] ERR   File "/home/vcap/app/lists/views.py", line 16
    [APP/PROC/WEB/0] ERR     return redirect(f'/lists/{list_.id}/')
    [APP/PROC/WEB/0] ERR                                         ^
    [APP/PROC/WEB/0] ERR SyntaxError: invalid syntax

Oops! We forgot that CF expects to run Python 2 applications by default (boo!). Let’s tell it our application doesn’t use legacy Python.

runtime.txt

python-3.6.2

And then $ cf push again.

We also need to add our domain to ALLOWED_HOSTS in our settings file.

superlists/settings.py

ALLOWED_HOSTS = ['ih-superlists.cfapps.io']

Now we can see our (non-CSS’d) site running at ih-superlists.cfapps.io! Let’s run our functional tests

$ STAGING_SERVER=ih-superlists.cfapps.io python manage.py test functional_tests

All three tests still fail!

Serving static files

One of the problems is that our static files are not being served properly. In our logs we can see the requests for our static files:

... [APP/PROC/WEB/0] ERR Not Found: /static/base.css
    [APP/PROC/WEB/0] ERR Not Found: /favicon.ico
    [APP/PROC/WEB/0] ERR Not Found: /static/bootstrap/css/bootstrap.min.css
    [APP/PROC/WEB/0] ERR Not Found: /static/base.css

The CF Python buildpack actually runs collectstatic as part of its process. Where are these files going? We can look inside the container by connecting with $ cf ssh ih-superlists.

The files are being collected during the staging process into /tmp/app/static, but this directory is not available in the eventual container that runs the application. Hence the lack of static files for our app!

Let’s collect our static files just before we start the gunicorn server instead.

manifest.yml

  command: python manage.py collectstatic --noinput && gunicorn superlists.wsgi:application

From the logs we can see that the static files are now in `/home/vcap/static’.

Side note: The VCAP acronym stands for VMware Cloud Application Platform, which was the original name of Cloud Foundry when it started at VMware.

We can run our functional tests again, or look at the live site and see that this hasn’t fixed our static files problem. We now have the static files, but they are not being served by gunicorn.

One way to fix this is to gather these files and serve them with another Cloud Foundry app which uses the static buildpack. We only expect a small amount of traffic for our application so in this case we can try to serve these files from the same server, using the Whitenoise Python library.

Add Whitenoise to your requirements.txt and then update the settings to include it in the Django middleware that is used.

$ pip install whitenoise
$ pip freeze | grep whitenoise >> requirements.txt

superlists/settings.py

MIDDLEWARE_CLASSES = [
  'django.middleware.security.SecurityMiddleware',
  'whitenoise.middleware.WhiteNoiseMiddleware',
  # ...
]

We can now see our site is served with CSS, but the functional tests still fail.

Adding a managed database

We can also see the problem in the logs.

... [APP/PROC/WEB/0] ERR django.db.utils.OperationalError: unable to open database file

Uh oh, we didn’t initialise our database. At this point we need to change from using the file based SQLite database which will be purged (along with all other files) each time we push the application. Let’s fix this by using data services provided with CF.

First let’s create a PostgreSQL database. Here I’m using the free tier provided by ElephantSQL on Pivotal Web Services.

$ cf marketplace
Getting services from marketplace in org ianhuston / space testing as XXX...
OK

service                       plans                                                                                description
...
elephantsql                   turtle, panda*, hippo*, elephant*                                                    PostgreSQL as a Service
...
* These service plans have an associated cost. Creating a service instance will incur this cost.

TIP:  Use 'cf marketplace -s SERVICE' to view descriptions of individual plans of a given service.

Let’s look at the ElephantSQL plans in depth:

$ cf marketplace -s elephantsql
Getting service plan information for service elephantsql as XXX...
OK

service plan   description                                            free or paid
turtle         4 concurrent connections, 20MB Storage                 free
panda          20 concurrent connections, 2GB Storage                 paid
hippo          300 concurrent connections, 100 GB Storage             paid
elephant       300 concurrent connections, 1000 GB Storage, 500Mbps   paid

Looks like the turtle plan will suit us. Let’s create a service on that plan.

$ cf create-service elephantsql turtle mydb

Next we attach this service to our app and restage as it suggests.

$ cf bind-service ih-superlists mydb
$ cf restage ih-superlists

We can now see our database connection variable in the environment of our app.

$ cf env ih-superlists
...
System-Provided:
{
 "VCAP_SERVICES": {
  "elephantsql": [
   {
    "credentials": {
     "max_conns": "5",
     "uri": SUPER_SECRET_URI
    },
    "label": "elephantsql",
    "name": "mydb",
    "plan": "turtle",
...

But how will our Django app know to use this database? We need to give these credentials to the application. One important thing to know is that the URI from the VCAP_SERVICES environmental variable will also be provided to our application in the DATABASE_URL variable. This is the same way Heroku apps receive database credentials and gives us the opportunity to use the small dj_database_url library from Kenneth Reitz.

Install the library using pip locally, add it to your requirements.txt and then let’s change our settings.

superlists/settings.py

import dj_database_url
...
#DATABASES = {
#    'default': {
#        'ENGINE': 'django.db.backends.sqlite3',
#        'NAME': os.path.join(BASE_DIR, '../database/db.sqlite3'),
#    }
#}

LOCAL_SQLITE='sqlite:///' + os.path.abspath(os.path.join(BASE_DIR, '../database/db.sqlite3'))
DATABASES = {}
DATABASES['default'] = dj_database_url.config(default=LOCAL_SQLITE)

The dj_database_url.config function automatically looks for the DATABASE environmental variable, and here we also give it a default to use when running locally. We should run our local tests again to check this still works.

Now we need to initialise our PostgreSQL database. We can do this using a once-off task with the relatively new cf task command. First push the application.

$ cf push ih-superlists

Then run the database initialisation as a task.

$ cf run-task ih-superlists "python manage.py migrate" --name migrate

You can check the status of a task by looking at $ cf tasks ih-superlists.

Once the migration task is finished, we can run our functional tests again.

$ STAGING_SERVER=ih-superlists.cfapps.io python manage.py test functional_tests

Success!

Let’s make one final change to turn off debug mode.

superlists/settings.py

DEBUG = False

Summary

Python & CF
We walked through a few steps there to get our Django app up and running on Cloud Foundry. Some of these are CF specific, and some are more about making our Django app more ‘cloud native’ in the spirit of the 12 factors. All the changes we made can be seen in this commit. You can also see all the code for the CF-enabled version of the Superlists app so far.

Let’s recap:

  1. Create a manifest.yml file with CF specific information.
  2. Create a runtime.txt file to specify Python version.
  3. Add your expected URL to ALLOWED_HOSTS
  4. Use Whitenoise to serve static files.
  5. Use a data service to create a database and connect it to Django.
  6. Initialise the database and run all migrations.
  7. Turn off debug mode.
  8. cf push your way to Django on CF!

Hopefully this is useful for you to get your Django app running on Cloud Foundry. Let me know in the comments if you have any other tips!

 

Mapping Dublin parish boundaries

TLDR: Go straight to the Dublin Parish Boundaries map.

In Ireland, most primary schools are run by the Catholic Church and the rules for enrolling often include complex lists of rules with those in the local parish often being preferred. This means when you are looking for accommodation to rent or buy it can be very important to know in advance which parish the property is located in.

The Dublin Archdiocese has a map of all the churches in Dublin but unfortunately it doesn’t seem to be working at the moment. Individual parishes sometimes have maps although these are often either static scanned documents or sometimes even hand-drawn sketches.

So how can we make these parish boundaries available on a modern map interface?

Fortunately for our purposes, the Catholic parishes are such an integral part of Irish society that the national Central Statistics Office reports their boundaries as part of its census data. This data is available under a custom non-commercial license from Ordnance Survey Ireland.

The data the CSO provide is in the form of Shapefiles but we can convert them to the more palatable GeoJSON format using the ogr2ogr utility from GDAL:

ogr2ogr -f GeoJSON -t_srs crs:84 new_file.geojson original_file.shp

Github provides a really useful GeoJSON renderer on their site but also for embedded maps. This means we don’t have to worry about creating a map and adding the parish boundaries as a layer.

The final piece of the puzzle is how to make the map available on the web. For this I used Cloud Foundry and in particular Pivotal’s hosted Cloud Foundry instance called Pivotal Web Services. [Disclaimer: I work for Pivotal but not on Cloud Foundry.]

I made a simple HTML page and using the Staticfile buildpack I was able to just do cf push to get the Dublin parish boundaries map up and running.

The final GeoJSON map as rendered by Github

One note of caution: the parish boundaries in the Census data may not correspond to those used by parishes or schools so please double check carefully before making any life-changing decisions!

 
Bear