This blog post was written by a blogger who is no longer at SAS
Frequently, statisticians have to act like doctors. We see statistical reports that try to describe something: how fast rumors spread based on how large a company is, or the relationship between nitrogen content and crop yield. Speed and gas usage. Almost anything you can think of.
So today, put on your diagnostician's cap and look at the four relationships I show you here. To keep you from guessing, I've hidden the labels for the two variables, so you'll be looking at Y1 and X1, Y2 and X2, Y3 and X3, and so on. Here's the data:
| X1 | Y1 | X2 | Y2 | X3 | Y3 | X4 | Y4 |
|---|---|---|---|---|---|---|---|
| 10 | 8.04 | 10 | 9.14 | 10 | 7.46 | 8 | 6.58 |
| 8 | 6.95 | 8 | 8.14 | 8 | 6.77 | 8 | 5.76 |
| 13 | 7.58 | 13 | 8.74 | 13 | 12.74 | 8 | 7.71 |
| 9 | 8.81 | 9 | 8.77 | 9 | 7.11 | 8 | 8.84 |
| 11 | 8.33 | 11 | 9.26 | 11 | 7.81 | 8 | 8.47 |
| 14 | 9.96 | 14 | 8.1 | 14 | 8.84 | 8 | 7.04 |
| 6 | 7.24 | 6 | 6.13 | 6 | 6.08 | 8 | 5.25 |
| 4 | 4.26 | 4 | 3.1 | 4 | 5.39 | 19 | 12.5 |
| 12 | 10.84 | 12 | 9.13 | 12 | 8.15 | 8 | 5.56 |
| 7 | 4.82 | 7 | 7.26 | 7 | 6.42 | 8 | 7.91 |
| 5 | 5.68 | 5 | 4.74 | 5 | 5.73 | 8 | 6.89 |
I've even taken some advice I heard at a conference and added a plot to the statistics so that you can better see the relationship. I fit the least-squares line to each set and attached the plot of the line. Click on any picture to see it larger.
Here's Y1 vs. X1:

I highlighted some typical statistics that statisticians might use in discussing how well this line fits. Circles in the picture show the equation of the line (essentially y=3 + ½x), the R2(≅ 0.666), and the F-statistic (≅ 0.022. If you don't know what they are, bear with me. You'll still get the joke.
Here's Y2 by X2. Check the labels if you don't believe me:

Here's Y3 vs. X3:

And Y4 by X4:

You should have noticed that all the statistics are identical. The graphs are identical; the line of best fit is pretty much y = 3 + ½x.
Here's the playing-doctor part. Consider the fact that you've got four patients (graphs) exhibiting identical symptoms, numerically and graphically. What can you tell me about the underlying causes? It turns out, not much. Although I blindly followed the "put a graph in there" rule, it turns out I left out the most important graph, that of the data itself.
Here are the four graphs again, with the data points turned on.
