• Welcome to my blog which mainly focuses on topics of interest to me.
    I am currently looking for new challenges.

Lions Squad distribution

Any rugby fan in the “home” countries this week has been analysing the squad announcement for this year’s British and Irish Lions tour of Australia. Warren Gatland has chosen his 37 man squad to bring down under and unsurprisingly there are quite a few Welshman in the list. Despite losing to Ireland in the first round of the Six Nations this year (yay!), Wales were able to come storming back to win the championship for the second year running.

But how much did this performance impress Warren Gatland? Well out of 37 players chosen this week, 15 are Welsh, overhauling England with 10, Ireland with 9 and Scotland with a token contribution of 3 players. Percentage-wise that’s 40.5% Welsh,  27.0% English, 24.3% Irish and only 8.1% Scottish.

To illustrate this dominance I thought it might be interesting to create a cartogram, a map in which the land area of the familiar regions has been replaced with some other variable, here the squad composition. Cartograms have become increasingly popular in the last few years, in particular to illustrate disparities on a world scale. In all these cases the comparison is inherently with the standard land area of the usual map with which we are all at least familiar.

So here is what the British Isles/these islands/British and Irish Isles looks like when transformed using the Lions 2013 Squad composition:

Lions Squad Cartogram

Lions Squad 2013 Cartogram, created by Ian Huston. Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

As you can see Wales is disproportionately represented in the Lions squad as we might have expected. Ireland seems to be pulling about right for its size given the relative lack of distortion but clearly England is not contributing as many players as it might (not even their captain made it!).

Read More »

Posted in Data Science | Tagged , , , | Leave a comment

New London Institute of Cosmology

At a London Cosmology Discussion Meeting last week, representatives of the London cosmology groups discussed ways of increasing collaboration and cooperation between the groups and how to utilize the various skills in London in a more coherent way.

One of the main outcomes of this meeting was a consensus to form a ‘virtual’ institute to act as an umbrella organisation for the collaboration efforts. This London Institute of Cosmology would initially act as a clearing house for event information, especially seminars in the groups, information about research visitors to London, and provide a structure for possible funding applications.

I am happy to say that a preliminary website has now been constructed with information about the groups in London, their seminar programmes and the London Cosmology Discussion Meetings. The address for the site is http://londoncosmology.org which hopefully will rise up through the ranks of Google search results soon. There is much work to be done on the site and the institute but hopefully this is a good start and provides a visible link between the research activities of the London groups.

Posted in Research | Tagged , , , | Leave a comment

Data diving for charity

Last weekend I took part in the first London DataDive, a charitable event organised by DataKind, who previously organised similar events across the US. The basic premise is that charities have collected large amounts of data, on donors, fund-raising and the actual care, help or interventions they provide. Without costly analysts to sort through and make sense of the data, it goes unused, providing little or no value to the organisation.

DataKindDatakind wants to solve this problem by organising business consultants, data scientists and other analysts to provide pro bono services to the charities over the course of a weekend. The basic format is similar to a hackathon, with Friday night being spent networking, learning about the problems of the charities and picking one to work with. Saturday is spent working on the data to provide actionable results for the charities. These results are presented on Sunday morning along with any considerations or suggestions from the data scientists.

The three charities at the London event were Oxfam, Place2Be and Keyfund. Having been intrigued by Hannah of Keyfund’s speech on Friday night I opted to help them over the weekend. Keyfund work with young people to develop their skills and confidence through small projects which are conceived, planned and implemented by the young people themselves. Keyfund coordinates the assessment and funding of these projects through partnerships with local organisations across the country.

OKeyfundver the weekend we analysed Keyfund’s data in a number of ways. In particular we considered the demographics of the children in the scheme, quantified the outcomes in terms of self assessments and skills profiles and assessed the likely effect of streamlining their process into fewer stages. Hopefully the results will be of use to Hannah and the Keyfund team in assessing their procedures and convincing funders to support this worthy cause.

On the technical side I took this opportunity to learn more about the Pandas library by Wes McKinney, which provides a structured data companion to Numpy‘s more homogeneous arrays. The accompanying jargon is quite similar to R, with data frames and series in place of arrays and vectors. Some elements took a bit of getting used to, but one powerful feature is the deep connections with Matplotlib, allowing easy creation of histograms and box plots from data frames. I hope to look more into Pandas, having just bought Wes McKinney’s new book “Python for Data Analysis“.

I really enjoyed the first international Datadive and really appreciate the work that organisers Jake Porway and Craig Barowsky put in to make everything run smoothly. The atmosphere was great throughout the weekend, including late into the night on Saturday and the participation from everyone involved was inspiring. At a time when the gender imbalance in science and technology is making headlines, it was also great to see an event where this wasn’t an issue in the slightest. Overall I would heartily recommend to anyone involved in data to give something back to the communities you live in by participating in one of these events. Plans are under way for more events of this kind in London and I will be jumping at the chance to get involved again.

Update: Just noticed that Dirk Gorissen who was on my team has a nice writeup with some results (including one of my graphs).

Posted in Interesting Things | Tagged , , , , , , | Leave a comment

Pyinspire – Python script to access INSPIRE database

The new INSPIRE HEP database has been up and running for a while now, and is going from strength to strength (despite some recent wobbles concerning citation counts).

I recently needed to get the BibTeX entries for a few papers and instead of copy-pasting from the results web page each time I wondered whether a more programmatic solution existed. There used to be a few utilities for SPIRES which enabled you to get results using a script but I haven’t seen any that do this with INSPIRE (although there is a plugin for Jabref).

I decided to cook up a quick script called pyinspire that will send INSPIRE a query and scrape the resulting page for results. It is available now in the Python Package Index with the source code on Bitbucket. I’ve released it under a modified BSD license so feel free to fork. Installation is as easy as pip install pyinspire or easy_install pyinspire.

The functionality is very basic at the moment but does include output in BibTeX, LaTeX(EU) and LaTeX(US) modes, and standard text output including citation counts. See the Bitbucket page for more details.

Posted in Research, Tools | Tagged , , , | Leave a comment

Trispectrum during Inflation

After a lot of work, “Large trispectrum in two-field slow-roll inflation” was released on the arXiv yesterday as arXiv:1203.6844. In this article Joe Elliston, Laila Alabidi, David Mulryne, Reza Tavakol and I look at the generation of higher order statistics during inflation in the early universe.

In the early universe the curvature perturbations, which later are seen as temperature fluctuations in the Cosmic Microwave Background (CMB), are initially thought to be Gaussian, but can become skewed during inflation depending on the physics of their evolution. In the last few years a lot of work has been done to both find evidence of this non-Gaussianity, and to construct physical models in which it is generated in the early universe.

In the past most of the focus has been on the 3-point function or bispectrum, and discussion of non-gaussianity has boiled down to finding bounds on the parameter f_{\mathrm{NL}}. In terms of the CMB the bispectrum in essence considers whether the temperature of three points on the sky is correlated. The WMAP satellite has not seen any definitive evidence of a non-zero value for f_{\mathrm{NL}} but the Planck satellite should be able to detect a signal if it is moderately large. In this work we look beyond the bispectrum to the 4-point function or trispectrum.  For the trispectrum the correlation we attempt to measure is between four different points on the sky.

In this work we have tried to find models which generate a large value for the trispectrum during inflation. We have found some new expressions for the parameters f_{\mathrm{NL}}\tau_{\mathrm{NL}} and g_{\mathrm{NL}}. The last two of these parametrise two different contributions to the trispectrum. The bottom line is that it is quite difficult to find conditions where the trispectrum can be large, at least under the assumptions we made of sum- and product-separable potentials using the \delta N formalism.

In the course of searching for models which give large values to these parameters we plotted the coefficient functions which need to be large as heatmaps, following Byrnes et al (arXiv:0807.1101). In order to generate these heatmaps I relied on the combination of Python, Numpy and Matplotlib, which I have used before on Pyflation. The script I used to generate the heatmap figures in the paper is now available as a repository on Bitbucket.

PS Someone really needs to work on the Wikipedia Non-Gaussianity page!

Posted in arXiv, Research | Tagged , , , | Leave a comment
Bear