cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Machines of Fit

 

John visually explores mechanical metaphors for statistical mechanisms, encouraging us to think visually and intuitively about statistical fits in terms of tensions and compressions

 

 

Automatically Generated Transcript:

View more...

 

So you won't need the your color vision in order to understand the next topic, but it is about seeing things. And so let me share my screen.

And so my screen should be coming through now. So I'm going to talk about visualization. But it's in a different sense than we usually mean it. It's not about what you see with your eyes, but more like what you see when you close your eyes.

 

And you use your inner eyes your insight, and how you perceive and imagine feeling the forces and energies involved in the situation. But we're still talking about statistics statistical fitting, which is a very abstract concept. So visualizing statistical fits means converting the concept Statistics into mechanical metaphors. And we refer to those metaphors. When we design the graphics. And with these metaphors, we can understand things better. And we can get intuition of how fitting really works.

 

So statistical fitting is really an optimization problem, we talked about maximum likelihood or minimizing the least squares of error. And so optimization problems, if you study it is mainly about balancing forces. So you reach an optimum we're moving from that position to another position takes energy.

 

And at the optimum, the net forces are zero, the net derivative is zero, the stored energy is at a minimum. So as we balance the forces and reach this minimum energy solution, we can think about it in terms of simple machines. And here, we're going to use springs and gas pressure cylinders, to visualize the forces and energies.

 

So we have two laws to consider.

 

First, the law of springs due to Robert Hooke who lived in the 1600s and was the curator of experiments for the Royal Society in the Age of Enlightenment. And the other Pioneer was Robert Boyle, who formulated the law gases, also in the same age of enlightenment. And he was considered the father of modern chemistry. So we have Hookes law and Boyle's law, which we're going to talk about in a few minutes. Now, I'm not the first to invoke Hookes law and springs, and it's been done before in fact, there's a book about it by rW Fairbrother on visualizing statistical models and concepts, and he refers to actual nukem more than 100 years ago. Who did the same thing, but the gas pressure ideas, perhaps something a little bit newer, but it's one of my favorites. So with continuous responses, we're going to talk about springs. And with categorical responses, we're going to talk about probabilities and gas pressure.

 

So Hookes law and, and Boyle's law. So let me do a little demo, we're going to open up a big class. And we have here a plot of height by weight. Now let's first suppose we're going to estimate the mean height.

 

Well, imagine connected each point is a spring and the other end of that spring is connected to a horizontal line. And so we can imagine we have to stretch that spring from its distance of zero to whatever the residual is going to be to connect it to that point, and we can pull that line and put it anywhere we want. And each position that we pull it to is going to store a different amount of energy. And the forces are going to be pulling one way or the other. And here you see, most of the forces of all these springs in red are pulling upward. And there's only a few pulling downward. And so we are going to release sets spraying, I'll press the Go button, until it finally finally settles down and all the forces are equalized between one side and the other. It's during the minimum amount in those springs.

 

So we can also allow the line of fit to not just be horizontal, and instead of fitting the mean it's going to fit the regression line. And so if I press the Go button and allowed to have a slope,

then it's going to feel the forces those springs running vertical from each point to the line of fit and A place that balances all the forces both vertically and rotationally is going to be the least squares regression line, as we'll see in a few minutes. And so that's Hookes law and operation for continuous items.

 

So how does the spring really behave? And here we see hex Hookes law. So as you stretch a spring from its zero initial resting state to some distance, the force increases linearly with distance. That's Hookes law. And so the tension on the spring or force is the displacement x minus mew. And that displacement is then scaled by the spring constant, how strong the spring is. And we're going to call that one over sigma squared. And so how much energy does it take to stretch the spring from its zero position to some distance x.

 

From the resting point μ? Well, the energy is just the integral of that force integrated over the distance. And so it starts with a small amount of force and that force is increasing. And if you integrate this straight line, you get the equation for energy, which is one half, that's the triangle area of x minus mu squared times to spring constant, one over sigma squared. So that's Hookes. Law, the behavior of springs. And so notice the squared term. That's the important part.

 

So let's go to the next. visualization, and imagine what we did with the animation before fitting a mean. So connecting each point to The line of fit, we need to stretch the springs in order to reach that line of fit. And that line of fit, we consider to be movable. And as he moves the line up and down, that tensions on one side or the other, will vary until we reach a balance point. And at that balance point where the forces on one side and the other are equal, that's the position of minimum energy among the springs. And that least squares position is the balancing position for springs and also is the least square solution that we have for fitting a mean. So let's test a hypothesis with that. So here on the left, we have everything perfectly balanced, and we record the energy in the springs. And then we say, Well, I have a hypothesized mean it's a little less than the sample mean.

 

And so I'm gonna just move on That line of fit down and have to increase the energy I put in the springs in order to make that happen. And now it's not balanced anymore, I've had to add energy to the system, that energy we have to add can be expressed in terms of the sum of squares, the same sum of squares for the energy equation. And that is the sum of squares due to that hypothesis. And so the more I have to stretch and store energy in the springs, the less likely that hypothesized mean is going to be the true mean. Okay, so, visualizing hypothesis test using this spring metaphor can be useful.

 

Another situation we have concerns. One way analysis of variance that is testing that two or more means are equal. So we can visualize each mean estimated separately. And here we have four observations. Three different groups, it's connected by a spring, the observation connected by spring to have a line of fit, which will be the sample mean for each group. And, and it's in a state of balance or minimum energy stored in the springs, by fitting separate means. But now we hypothesized that really the means for all these groups are the same. And so we do that by aligning all these lines of fit to be one line of fit and finding the balance position for fitting just a single mean. And the amount of energy we have to add to the springs in order to force that constraint will then be the sum of squares to the to that hypothesis. And so testing in the one way layout or the two sample means test is exactly analogous to adding energy to set a springs in order to force that constraint

 

So what about concepts like design of experiments where we want the most powerful design to make our testimony most powerful. So, consider an unbalanced design where we have six observations in one group and two observations in the other group. Well, where we have six observations, there are a lot of springs, and it's forcing the line of fit to be very stiff. And it's going to be very difficult to move that line of fit away from, from the true mean. But where we have only two springs is can be very easy. And so it's going to be easy to move this line of fit up to the others. And we're not going to have a very powerful test of hypotheses, the sum of squares to to have hypotheses is not going to be as great as if we have a balanced design and with a balanced design, where we have an equal number of observations, then we have a more powerful test. And so that's You know unfolds into the theory of design of experiments for what the most powerful design is.

 

Now what about if we have different variances in each of the groups. So on the left, let's suppose that we have a greater error variance. And remember, the spring constant is one over sigma squared. So that means we have very weak spring constant, then that mean is not held very tightly, and we can move it without using very much energy. But if we have a much smaller error variance, meaning a much larger spring constant whenever sigma squared, then it's going to take a lot more energy to move the springs. And so the idea of reducing residual variance to make test hypothesis more sensitive means that we increase the strength of the springs, decreased sigma squared and if we have Two groups, for example, with different variances, the allocation of samples between the two groups should be related to the spring constant or the error variance between the groups.

 

Okay, so lots of metaphors work through from springs into statistical concepts. Same as true sample size. If we have a smaller sample with only a few observations, the means are not held very tightly in sec to pull them together. But if we have a lot more observations where the line of fit was held by a lot more springs connected to observations, then it's going to be a lot more difficult. And so as we increase sample size, we increase the power of the test greater sigma squares in order to test that hypothesis. Okay, so we've covered fitting means across classifications what About regression lines. And as we saw at the beginning, this time, we're balancing in two different directions up and down, and the slope or angle of the line. And so the line is where the forces governing both those is at a minimum energy solution.

 

If we want to test the hypothesis that x doesn't affect y, then we're constraining the line of fit to be horizontal. And if we do that, and balance out that situation, the energy stored in the springs compared with the unconstrained line is the test for the hypothesis that the slope is zero. So given that we, that leads to concepts like leverage, supposing our x's are distributed fairly closely together,

in our sample, then we are fitting those springs new They're the middle of line. And we can pretty much pull that lever of ally in one way or the other. And it's gonna be pretty easy because the springs are not constraining the slope very tightly, or maybe constraining it up and down, but not to vary the slope.

 

But if we put our observations at the end of the lines and separate them out, then we have a lot more leverage. And therefore, our test hypothesis test will be more powerful to test that, and that's a basic principle of DLP to basically separate your, your values widely, as widely as practical in order to make a more powerful test.

 

So that's the concept of springs and testing hypotheses. But now we have categorical responses as well. So how do we do that? Well, we imagine things sort of like tire pumps. Remember With a tire pump, it's very relaxed at the top. And so if we push that tire pump down to compress the air in it, it gets, we feel more and more force as that tire pump then gets compressed down to towards a zero distance between the end of the piston that tire pump in the bottom

of the cylinder.

 

And so in the same way, as we use springs, we can imagine tire pumps or some form of compression gas compression. And, we divide up our sample into a space where we have a partition between each space, which corresponds to a line of fit. And what we're doing is feeling the forces of gas pressure between all these partitions and bouncing it so the forces acting on one side and the other and on one side may include forces from many categories, many response levels, pushing one way versus the other. And as those balance out, then we're going to get the minimum gas energy in the system.

 

So if I pull this down and make them unrelated to the the ratio of the number of samples in each each level, then I have unequal energies. And this, this is also analogous from big class where I am estimating the age proportions, the probability that you're in a given age group in this sample of data. And so in that age group 12, I have eight observations and that corresponds to the eight observations and the lowest component. And with age 13, I have seven and that corresponds to the seven and the next, and so on. In the last two for 16 and 17, I only have three observations, the only three gas molecules or tire pumps in that category.

 

And now if I want to let it go in a cooler Bay, so that the line of fit is equalized across all these things, and I see that the distance between each one is proportional to the rate that those things occur in the population. And so the, the rates that I see in my population are the minimum energy or balancing solution to the equations for the gas pressure across these these partitions, where they all add up to one.

 

So So now, let's see how it works in theory, and we here we have Boyle's law. So with Boyle's law, we start out we'll start out normalized deposition of one where the pressure inside the cylinder and is basically equivalent to the atmospheric pressure. And then as we pressurize the cylinder, then we're, we feel greater and greater force as we're pressurizing the cylinder to move farther down and compress the gas. And so it turns out that the pressure is one over P, the position of the cylinder. And of course, it can get infinite if, in an ideal situation, the pressure gets infinite as we try to pressure towards zero. So now, we feel this pressure inversely related to the P or the probability rate we're attributing to the statistical solution. And so we need an equation for energy stored in the system. So the energy is just the energy From one down to whatever position p we have, and it turns out the energy one is normalized, the log of one is zero.

 

And so as we go from there, down to P, the integration of that there's separate goals, one over P is minus the log of p. Okay? So the energy restore and the gas pressure is minus log of the probability. So this turns out to be the negative log likelihood of a statistical system, doing pretty much the same thing, but in terms of log likelihood, or negative log likelihood, rather than

gas pressure.

 

So let's see how we can visualize situation. And here's the situation where we have the car pole example where, let's say we have 13 A sample of 13. And six people chose American brands, two people chose the European brands of cars. And five people chose Asian brands of cars.

 

And so if we load up pressure cylinders and D to these compartments, where we can move the line between the compartments one way or the other, and then each cylinder is identical. And what we want to do is balance the net pressure in each partition, so that it balances out to the minimum energy stored in all these gas cylinders. And it turns out, you do that by just estimating probabilities or the

log likely, the maximum likelihood solution to storing the those those energies turns out to be the rates in the population, or five out of 13, two out of 13 and six out of 13 And that minimizes the total energy in those gas cylinders. Or you can also represent those gas cylinders as just units of gas pushing in these compartments. And think about the kinetic theory of gases doing pretty much the same thing as the pressure cylinders do. So, how do you do across groups? Well, let's say that you divide your carpool examples.

 

So you have a different different groups corresponding to young young people, middle aged people and older people, and they have different preferences and they're gonna have different response counts for each of the car preferences for American, European and Asian brands. And so, if we then solve the same system by minimizing the energy And balancing the the pressure of these partitions, then it's going to estimate different rates in different different ones of these groups. And so now, our next thing is to think of how I'm going to test the hypothesis for this. And to test the hypothesis that the rates are the same across these groups means moving these partitions to be the same across groups, balancing it with that constraint to the minimum energy solution subject to that constraint. And the amount of energy I've had to add to the system, in order to do that will be the negative log likelihood.

 

For to calculate a chi square test, same idea is sum of squares, but it's using gas pressure instead of Pat and the same thing can be done with a continuous responses where I could think of of Tire pump or gas pressure cylinder at each position of the axes. And they have to fit in these partitions where the line of partitions are constrained to be of a logistic or profit curve. And then I balance all those positions. I see here that this tire pump has had to be compressed a lot. So this is like an outlier in a categorical sense. It's a pretty large x corresponding to what x would be expected, and so it's an unlikely value, but it corresponds to fitting the situation. So now, I think about these gas pressure cylinders, in terms of strengths of relationships, so, I have a near perfect fit and the logistic regression. I have complete separation between you know some value of x separates That one response category from another response category. If I have a strong relationship, then instead of complete separation, there's a partial separation. But there's a lot more going on in in one versus another. And if I have a weak relationship, then these gas pressure cylinders are fairly equally distributed across that relationship. And so the line of fit is close to a straight line. And I have a weak relationship in the energy in order to constrain it to a perfectly straight line will be very small compared to what I'd have to add to a system where we have near perfect fit.

 

So all this goes into designing the graphs so that when we see a graph, we think about forces energy. So let's think about the continuous case. And we think about fitting a mean. And we think about this like a vertical scatterplot. And we think of springs attached to that. And this is the balance point that balances all those springs for a univariate continuous response. For a one way situation, we think about fitting a different mean to each of these and springs connected each point. And then if I constrain it, and fit it to the straight line, that how much energy I would have to add the system goes to the sum of squares to that hypothesis, the same similar with continuous I think about springs detached from each point to the line of fit, and how strong I would have to or how much energy I would have to add in order to stretch that line of fit to be horizontal. And similarly, in categorical I think about

pressure gas pressure when I do these divided bar charts now Divided bar chart is not very good for estimating rates compared to just a single bars. But a divided bar chart is good for visualizing the gas pressure illusion of the solution being to equalize the pressure in each compartment of these chambers.

 

And then when I do several groups instead of just one, then I get to the mosaic plot where these divided bar charts are side by side and proportional to that. And I can think of how much energy I have to add the system to make these compartments align across each of that and that's the likelihood ratio test for that hypotheses. This is one reason why I like Likelihood Ratio Test better than Pearson tests, even though both are similarly powerful and most categorical situations because I can Think of the gas pressure when I do this. And the same thing as when I think of compartments in a continuous logistic regression. In this case, I think of about maybe these points are our air molecules bouncing back and forth, vertically. And as with only a few points out here, the line of fit is dominated more by, by the other things. And I can think of logistic regression that way. So that's the way I think of visualization, seeing the forces and energies and graphs, which you can't see directly. It's not something that's visual. It's something you sense as you use these mechanical metaphors, Hookes law and Boyle's law, in order to drive the statistical concepts.