With the 2008 Summer Olympics happening in China right now, I am reminded of when I used to run the 200-meter (1/2 lap) and 400-meter (1 lap) events on my high school track team. I wasn’t that good, but I enjoyed it. My best time in the 200m was about 25 seconds (current world record is 19.32 seconds), and my best time in the 400m was -- now don’t laugh -- about 58 seconds (current world record is 43.18 seconds). By the way, both those world records are held by the same man: Michael Johnson.

Consider the world record for every major running event, sprinting and distance. The IAAF’s (International Association of Athletic Federations) Web site has all the data. I entered the data into a JMP table, everything from 100m to 42,194.988m, which is a marathon. Times are recorded in seconds. The data can also be downloaded from JMP's file exchange.

Can the record be modeled/predicted as a function of distance? I make a scatterplot of Seconds vs. Meters using JMP’s **Fit Y by X** platform:

The data seems to be cramped together for the shorter races. I take the natural log of both Seconds and Meters and replot:

The relationship looks very linear and very strong! Let’s fit a simple linear regression using the **Fit Line** command:

The RSquare is .999. What model am I actually fitting? Consider the following power model:

Seconds = *a**Meters^*b*, where *a* and *b* are parameters. I take the log of both sides to obtain:

ln(Seconds) = ln(*a*) + *b**ln(Meters).

That is the same model I fit above. The relationship between Seconds and Meters can be approximated by a power model, but it can be linearized by taking logs. The parameters of the power model can be computed. The *b* parameter is given above as 1.1065, and the *a* parameter is exp(-2.8081) = .060319.

So, the final prediction model is:

Seconds = .060319*Meters^1.1065.

How well does it predict? The RSquare is strong as we've shown. Also, for each row, I compute the absolute value of (Actual-Predicted)/Actual. The mean result is .036, meaning a 3.6% difference between actual and predicted. The largest percent difference is for the 200m race, with a predicted value of 21.21.

Now to the title of this blog entry, “What would be the world record?” Pick any number between 100 and 42,195. Let’s use 6731 meters. The model predicts the current world record for that would be 1038.06 seconds. The model fits fairly well, so we might be close. But, we’ll never know because it’s not a real race.

I could have fit this model directly on Seconds and Meters, using the **Fit Special** command. It allows you to specify a transformation on the Y and X. The RSquare and parameters turn out to be the same.

Take note of the exponent in the model, the *b* parameter, 1.1065. Because that is so close to 1, the model is nearly linear. Will a simple straight line predict just as well as the power model? A linear model indicates that the meters per second (mps) is the same for every distance. Computing the mps from the data gives the following:

The mps clearly decreases as the distance increases. I fit the linear model to the data. It fits well for the long races (stable mps), but it fits terribly for the short races (mps changing a lot). After all, I don't know anyone who can maintain a full sprint pace much longer than 400m.

If you are wondering, I also fit a similiar model to the women’s records. But I leave it to you to get the data and try it on your own.