Choose Language Hide Translation Bar
Own your own data

JMP is a powerful analytic and visualization tool, but in order to use it you need data. Many of the best features in JMP are related to importing your data – Query Builder, XML processing, multiple file import, and bringing in data with HTTP request are all newer features geared at bringing in data from other sources with ease so that you can analyze and visualize the data with JMP.  Recode and other features help you to take messy data and transform it into a clean data set, and the data table provides featues such as virtual join to combine data sources effectively.  JMP is great at handling your data ... assuming you have access to it.

In our professions, we routinely work with data sources provided by clients and internal to our companies, analyzing data to increase profits, improve processes, and make decisions. As data scientists, some of us like to look at our own personal data, and this blog post began as I started to work on an upcoming paper about my running data. Mine was less of a personal story like a few of my colleagues have presented on the topic of quantified self and their own personal data; it was more about my interest in creating visualizations and finding predictors of success in a race by analyzing my training leading up to the race. However, creating an analytical paper like this does bring in aspects of my personal story, and I wanted to share my experiences in trying to gather my running data.

I began running at NC State because I was required by my degree program to take an introductory PE class. Given the choices, I went with "Run Conditioning," which culminated in a 5K race. I got hooked on running and racing, and I purchased a Garmin Forerunner 301 that synced to a little Windows application that was great at viewing the data, but didn't provide visualizations and no way to share your info. At the time, I wasn't really interested in that, although I did like having a tally of my total miles run and weekly miles for motivation.raceBibsAndMedals.jpgMany runs and races are stored in various data sources over the years.In addition, I was carrying an iPod to listen to music, and later got an iPhone capable of acting as its own GPS. At the time, there wasn't a lot of competition, and Nike had developed Nike+, which had a way of handling running on the treadmill without GPS signal as well. I didn't want to have to put on a watch and carry my iPhone, which I was already doing to listen to music, so I moved over to Nike and retired the Forerunner.

Garmin had a way of exporting to their TCX file format, and Nike at the time allowed importing that format, so I consolidated my data. Nike's website provided nice visualizations and Google-powered maps that I could access from anywhere I had the internet, a public API, the ability to import and export my data – and even easy ways to share runs to social media if I wanted to.

This isn't a post about the alarming amount of data that people put into the cloud, social media, and third-party software or the security implications created by those decisions. I do find those topics important and interesting, and often overlooked, but I found myself a good 10 years later wanting to gather all of my data to work on this paper, and I couldn't.

The Nike+ website no longer shows any of the visualizations and stats it once did – you have to use the latest iteration of their app called Nike Run Club. The app is fine and works much like the website used to, but there is no way to export data. If you search on the web, you find a graveyard of broken links to a now-removed REST API documentation, and a myriad of websites and apps created specifically to get your Nike data that have risen and fallen. Just about a year ago, I was able to extract my 2016 marathon run data using n+exporter website, which has since been broken by changes Nike made to its API.

After much searching, I stumbled on mentions of an app called Run Gap, which works as a way to manage multiple data sources – you provide your login information, and it can import your data from more than 20 popular sources, including Nike for the time being.

I downloaded it and gave it a try, and was finally able to extract my data, although in a very inconvenient format of about 700 individual GPX files. I may just pay a fee to unlock the ability in Run Gap to transfer data directly from one source to another. After discussing this issue with my colleague Julian Parris (@julian), I have moved over to logging my current runs in Strava. The API support and import/export capabilities are much more user-friendly (at the moment).

My point is this: If you are relying on a third party to store your data, you could lose it at any time. It is spelled out in the fine print. The Nike Terms of Service clearly state:

NIKE may terminate or modify any Nike Platform, member program, product or service at any time without notice.

NIKE may terminate or suspend your account, delete your profile or any of your User Content, and restrict your use of all or any part of the Platform at any time and for any reason, without any liability to Nike, subject to applicable law.

This means Nike could for any reason delete my data or lock me out of my account. And this is not unheard of. For example, in 2015, LinkedIn decided to remove the ability to mass export your contacts, and there was a huge backlash. Luckily, LinkedIn listened to its users and reversed course. However, the importance of data to discovery is paramount – data-related challenges are one of the biggest hurdles to AI. Collecting data can be difficult when the data comes from other sources, but at least you can own your own data. So, go get your data while you can.