Mar 23, 2020 7:45 AM
| Last Modified: Mar 25, 2020 2:22 AM
For the last six years, I have had the privilege to visit scientists and engineers at the most innovative companies in the UK and Ireland. Like most other people right now, I am looking at a long period of working from home. This feels like a good time to take stock and see what I can learn about my work travel patterns.
Using Exercise Activity Tracking Data
"The Golden Triangle" between Oxford, Cambridge and London.
Travelling to customers often requires an overnight stay. When I started this job, I decided that I should always go for a morning run when staying away for work. To my surprise, I have mostly achieved this. The proof is in the data from the exercise tracking app on my phone that I have used to log these activities.
What can I find out from this data? Could the data tell me where the hotpots are for R&D and innovation in the UK and Ireland? It feels like I have spent a lot of time in the south of England, particularly just north and west of London. The “Golden Triangle” of Oxford, Cambridge and London is often spoken of as having much of the UK’s thriving biotech industry as well as the main sites for UK "Big Pharma" companies. Are there other hotspots? By exploring the data, I found answers to these questions. I also found that this is an example where preparing and analysing the data were not distinct steps. This is often the case, in my experience. People say that they spend more than 80% of their time just preparing the data. I think that you need tools that enable rapid iteration between the preparation and analysis steps to most efficiently and effectively get insight from your data. Let me show you…
Getting the Data In
My first challenge was that the data for each activity were in more than 600 separate XML files. I needed to get these into a single table for my analysis.
Importing 600 XML files using JMP.
Thankfully, JMP has some great tools to make this quick and easy. I have written a separate post about how I did this.
Getting the Right Data
The data also includes activities from when I have travelled to the US and around Europe for work and for holidays. I needed to exclude these activities. First, I plotted the average latitude and longitude of each activity. Then it was easy to select the UK and Ireland activities, and exclude and hide the rest:
Using dynamic, interactive visuals to prepare data.
Using this method it was also easy to find, hide and exclude activities that I did from home and while on holidays in the UK. Being able to use the interactivity of the plot during data preparation saved a lot of time.
What Did I Find Out?
Once I was happy that I had excluded data from home and non-work trips, I plotted a density map of all activities. There is definite evidence of the golden triangle:
Plot of all my exercise activities when travelling for work in the UK and Ireland. Density plot overlaid.
There are also apparent hotpots in the North West (surrounding Liverpool and Manchester) and the North East (Newcastle and Teesside) of England, which are both areas with a long history of chemical manufacturing. Of course, there are lots of reasons not to trust this as a definitive map of R&D activity in the UK and Ireland (see “A Few Caveats”, below). So I looked at a more reliable source of data ("Business enterprise research and development, UK: 2018", Office for National Statistics):
2018 R&D spend by UK region (data from Office for National Statistics)
There is agreement that East of England (including Cambridge) and South East (including Oxford) are where the majority of R&D money is spent: 41% of all UK R&D investment is in these regions. However, R&D spend in the North East is apparently very small (1.8%) -- any ideas why? I will have to see if I can find some equivalent data for the Republic of Ireland.
I have also visualised the data from the Office for National Statistics using a bubble plot map:
This interactive visual is published in JMP Public. Press play and interact with controls to change speed and bubble size.
I am not sure that this visual adds a lot to our insight. I've seen a lot of this kind of plot recently. I thought it would be nice to make one that is showing the increase in something positive.
Lots of people are currently working extremely hard to use any available data to answer some very important questions. By contrast, this was a very trivial analytics example. However, it shows how you can get to answers faster with agile methods for data access, preparation, visual exploration and analysis.
A Few Caveats
I don't cover all customers and locations in the UK and Ireland. I have lots of colleagues who also were travelling to other locations during this time.
I’m not very consistent in logging my activities. Sometimes I forget to take my phone on my run. Sometimes I forget to use the app.
Some of these activities were around visits to our offices for internal meetings. I would expect that London and Marlow in Buckinghamshire (where we have our main UK offices) are over-represented because of this.
Some of these activities are around visits to conferences and other events.
For most of this time, I was living in Scotland, near Edinburgh. Areas that are easily reached within a couple of hours from there would not warrant an overnight stay so they are under-represented in this analysis. This includes some places with significant R&D like Edinburgh, Glasgow, Newcastle and Dundee.