Advent of Code 2017 in Kotlin

I talk about Python a lot here but at work we also write quite a bit of Java (with Spring of course). Since it hit 1.0 in 2016, I’ve been hearing more and more about Kotlin as a viable alternative to standard Java on the JVM.

KotlinAt Pivotal we’re seeing clients start to use Kotlin because of its brevity compared to Java, the ease of use of existing Java based libraries, the adoption by Google for Android, and the (relatively new) ability to also target Javascript.

To find out what all the fuss is about, I decided to try out Kotlin for last month’s Advent of Code set of programming puzzles.
Advent of Code 2017

True to its reputation, it was easy to get a project started in Kotlin (using IntelliJ) and familiar tools like JUnit for testing are available without any trouble.

The puzzles in Advent of Code are of a particular type, mainly computational tasks with some algorithmic tricks often necessary for the more difficult second half of each challenge. So this was really a test of how well I could use Kotlin for these kinds of problems, rather than, say, doing a lot of data cleaning or creating an Android app. The well written problem descriptions lend themselves very well to a TDD approach as they almost always include examples that capture the complexity of the problem.

You can take a look at my solutions for the first 21 days of problems on Github. In the run up to Christmas, family preparations and celebrations took precedence over coding puzzles, but I’ll update the repo if I get time to complete the last few days.

It’s early days for my use of Kotlin, but so far I’ve found the language to be a nice mix of my favourite parts of Python (simplicity and flexibility) and Java (type safety and language specific tooling). Recently there has even been a movement to build a community of data scientists using Kotlin, and Thomas Nield’s videos are a good way to get started in this area.

 

DataDive toolbelt

logo-datakindWhat tools do you need to bring to a DataDive? The next DataKind UK DataDive is taking place in two weeks time in London. I took part in one of the previous DataDives and I would highly recommend the experience for anyone with data science or analytical skills who wants to help charities use their data.

The DataDives take place over the course of a weekend and in that time you have to decide on a charity to work with, understand their data and goals, perform your analysis and present your results in a usable form. That’s a lot to get through in just over two days so it’s very important to be able to get up and running quickly with the analysis. I thought it might be useful to list the software and tools that I will be packing in my DataDive toolbelt this time around. Continue reading…

 

Pyinspire – Python script to access INSPIRE database

The new INSPIRE HEP database has been up and running for a while now, and is going from strength to strength (despite some recent wobbles concerning citation counts).

I recently needed to get the BibTeX entries for a few papers and instead of copy-pasting from the results web page each time I wondered whether a more programmatic solution existed. There used to be a few utilities for SPIRES which enabled you to get results using a script but I haven’t seen any that do this with INSPIRE (although there is a plugin for Jabref).

I decided to cook up a quick script called pyinspire that will send INSPIRE a query and scrape the resulting page for results. It is available now in the Python Package Index with the source code on Bitbucket. I’ve released it under a modified BSD license so feel free to fork. Installation is as easy as pip install pyinspire or easy_install pyinspire.

The functionality is very basic at the moment but does include output in BibTeX, LaTeX(EU) and LaTeX(US) modes, and standard text output including citation counts. See the Bitbucket page for more details.

 

Minor Tick Labels in Matplotlib

This is a slightly more technical post than usual but having figured out how to do something quite esoteric in Matplotlib I thought I would write it down to save me remembering.

I have been making quite a few plots recently for a paper which should hit the arXiv very soon. The Python plotting package Matplotlib has been indispensable in this regard, especially as I took the effort of creating a script which creates all the plots. This meant that redoing all the graphs for new results or with changed sizes etc. was a simple as rerunning the script.

Quite a few of the plots use log axes and while Matplotlib performs admirably there was one problem I had with certain plots. By default, the log plots only show tick labels for each order of magnitude. Tick labels are the numbers on the x or y axis telling you the corresponding numerical value, and when the figure is zoomed in it is possible to lose the major tick label at say 10-4 because you only want to plot values from 0.3×10-4 and 0.5×10-4. Obviously this removes all sense of scale from the plot. A very mediocre solution is to just zoom out until a major tick label is back in the plot but this is obviously unsatisfactory.

I looked through the Matplotlib documentation, which has very detailed information about the API and has a lot of examples, but unfortunately didn’t address this exact point. After a bit of searching I found a useful conversation on the users mailing list which got me close but didn’t use the LaTeX labels which are really essential for publication quality graphs (in my opinion anyway!). The tick labels documentation along with the major-minor ticks example led me to the Formatter classes, especially LogFormatter and LogFormatterMathtext. This looked like the right answer but unfortunately LogFormatterMathtext writes the minor tick labels in a very unusual way. Instead of 0.3×10-4 it only writes an exponent, so 10-4.52.

I finally settled on extending the pyplot.LogFormatter class which controls the text for the tick labels. My subclass is as follows:

import re
import pylab

class LogFormatterTeXExponent(pylab.LogFormatter, object):
    """Extends pylab.LogFormatter to use 
    tex notation for tick labels."""
    
    def __init__(self, *args, **kwargs):
        super(LogFormatterTeXExponent, 
              self).__init__(*args, **kwargs)
        
    def __call__(self, *args, **kwargs):
        """Wrap call to parent class with 
        change to tex notation."""
        label = super(LogFormatterTeXExponent, 
                      self).__call__(*args, **kwargs)
        label = re.sub(r'e(\S)0?(\d+)', 
                       r'\\times 10^{\1\2}', 
                       str(label))
        label = "$" + label + "$"
        return label

It is provided as is, but there shouldn’t be too much wrong with it. One odd thing is that the LogFormatter class is an old style class, so I inherited from object to make it my subclass a new style class. This might be dangerous and cause some unexpected problems.

To use the class you can do something like the following:

import pylab
import numpy as np

fig = pylab.figure()
pylab.semilogy(np.logspace(-6,-5))
ax = fig.gca()
ax.yaxis.set_minor_formatter(
    LogFormatterTeXExponent(base=10, 
     labelOnlyBase=False))
pylab.draw()

Below are three different figures showing the current default situation, the result of using LogFormatterMathtext and the result of the new class. I hope this will be of use to someone who has been struggling with this problem as I have.

As I mentioned, this came up because of a paper that is very nearly completed and should be available soon. Along with the paper we should have the long promised release of the code I have been working on which solves cosmological perturbation equations during inflation. More on that soon.

 

Inspire Beta: New Interface to SPIRES database

INSPIRE BetaA colleague mentioned today that the front page of the venerable SPIRES database of High Energy Physics papers is now promoting the new Inspire interface which was announced a few years ago.

The website for the current beta phase of the project is http://inspirebeta.net.

It is unclear to me whether “Beta” is part of the site name, as suggested by the URL and the text on the INSPIRE page, or whether this is just the beta phase of the INSPIRE project as the SPIRES homepage seems to imply. It would certainly be an odd decision to use a different URL for the beta phase and force everyone to change bookmarks, references in blogs, literature etc., once the beta phase is over.

These are quick first impressions because I haven’t had much time to use the new service. First off it is fast. Very fast. So fast that when searching for my name INSPIRE claims that the “Search took 0.00 seconds”. It feels almost instantaneous. This might be because of a light load before the hordes using SPIRES are switched over. It is certainly an improvement on the interminable and often futile stretches of time needed with the SPIRES engine.

Continue reading…

 
Bear