Exploring Basic Linear Regression Visually Through Structural Equation Modelling...

Hi, I'm Jordan Walters,

I'm a Technical Intern at JMP

and I'm going to present to you today

about Structural Equation Modeling

and how we can use this in linear aggression

to actually find relationships

with otherwise might not already be obvious.

So to do this, we're going to have a look

at a case study about manufacturing of pharmaceutical tablets.

Within this case study, there's lots of factors and responses

that need monitoring and we need to try and determine the relationship

between these factors and these responses to try and work out how the system works.

And in doing this, this is going to allow us to find

the optimal conditions for our tablet to make the best tablet possible.

But to make this entire case study a lot simpler,

rather than focusing on all of the responses and all of the factors,

we're just going to focus on one response and one factor.

So in this case,

the one factor we're looking at is the percent composition of water

and the one response that we're going to look at

is the density of the tablet.

And that's going to be the metric of whether or not

we have the tablet or not.

So just for reference,

the tablet density is in milligrams per centimeter cubed.

So typically, how we define the relationship

between this response and factor

is just through a simple linear regression

which from the name,

gives us a linear relationship between the response and factor.

And this is what we need to fully understand the system.

But the problem with linear regression is it's only going to allow us to find

the direct effects between changing

the water composition and the density of the tablet.

It's not going to let us find

anything else about the system or any underlying features.

And so because of this,

it might actually fail to give an impression for the system.

And this is where Structural E quation M odeling

can actually come in to try and give us a more comprehensive view of the system

which is going to allow us to find different relationships

which maybe more support and not need for obvious .

So a bit of background on linear regression,

in its simplest form, linear regression

is just a linear relationship between one variable to another.

And since linear regression is by definition, linear,

this is going to be connected by a linear equation.

In this equation in this case, is Y equals mX plus C.

Well, Y is the Y response,

X is the X factor, m is our radiant, and C is our offset.

So the m and the C are just constants that we're going to find

to fill this regression equation

so that we can relate on Y on X variables.

So to do this in the context of the case study,

we're going to plot the X and Y variables

to try and come up with this regression equation in JMP.

And JMP can very simply do this for us.

It gives us a lot of information,

some of which we don't need for performing this linear regression.

But the two pieces of really important information that we get

is the regression coefficients which is our m and our C,

which are about here,

and the regression equation

which is the full form of our linear regression.

Now JMP does give this to us in the form of Y equals mX plus C

but for the purposes of clarity, and to see how this works

in our case study is presented here as a proper relationship

being that the density is equal to 0.11 7 percent of water plus 95. 903.

And this is reported above with your m and C values.

Well, you might simply see that all this relationship means

is that for every one unit increase in percentage to L,

we get a one unit increase in density.

But we have that C value at the end which offsets our density by 95 .93.

The purposes of the C value is to give a scale to our data.

Without that scale, what we have is a relationship

between density and percent of composition of H_2O.

Adding this constant at the end gives us that scale

to actually be the units that we need to study within this case study.

So now that we've got an idea of how we traditionally

go about looking this problem,

let's try and start building the Structural Equation Model.

So to transform traditional linear regression

into the form of a Structural Equation Model,

we need to do a few things.

We begin by moving this relationship

away from its graphical form, into this visual form.

Well, we can see that we've got our X factor

in the rectangle being linked to the other rectangle

which is our response.

And between these two rectangles, they are linked by this arrow.

Now, a single- headed arrow just means a relationship going one way.

And then Structural Equation Modeling, it is possible to have double- headed arrow

but that's not particularly important in this example.

What the single- headed arrow means

is that H_2O composition affects the density of the tablet

but the density of the tablet does not affect the H_2O composition.

And this allows us in this instance,

fill this individual X and a Y perform linear regression.

Now the confusing part about this graph

is probably the one in the triangle at the top.

And simply, all this does, and this is usually hidden

within a structural equation model,

is it allows us to set this scale for our entire model.

And we can see that through

how this Structural Equation M odel actually reports the data.

So if you look across the axis

between the one in the triangle and the Y response, the density tablet,

you'll see that we actually get a number on that,

which is 95. 903 which we've seen before.

It's actually our C value from our linear regression.

And this further proves that this is the part of the graph

that is used in this given scale.

So if we can see where the C comes from

and that directly translates where does the m coming from.

Well, the m is actually from a connection at the bottom there

between the percentage to a composition to the density of the tablet.

And we can see that it's not 0.117, the exact same as we found.

Bear in mind here that both of these analysis performed

in two different platforms.

Now, we might have two parts of our Y, it's in the C equation there.

But this schematic is in the form of three sides.

And so you might be wondering what is that final side?

And simply put, the left hand side,

which is essentially the number one and %H2_O Composition,

is just the X and the SEM.

Similar to how on the C value,

which gives us the offset of the Y goes into the C graph,

is the Y- intercept of the graph

which is typically important and quite useful statistically.

We do also get this X value given to us by the Structural Equation Model,

which isn't particularly important

because we don't have a good look at the X-intercept

but it's interesting that the analysis can provide us

with this additional information without really asking for it.

This is all computed automatically like I say.

And then typically, there's top parts hidden,

of course, just to give us context as to what's actually happening here.

And so looking at what this we've actually got here,

we've got these solid lines

which denote that something is statistically significant.

And between the H_2O composition and the density of the tablet,

we have this dashed line which actually denotes

that that is not a statistically significant correlation.

And the fact the correlation is very low, with 0.117

implies it's not that well correlated anyway.

So if we just took this at place value,

you might be inclined to agree with our initial linear regression

and say there's no effect

from H_2O composition on density of the tablet.

But since we've already got on data

in the form of the Structural Equation Model,

let's continue with the exploring to day

and see if we can cover any hidden relationships.

So we know that this entire case study

is built up of lots of X factors, lots of Y responses,

and we're sure that we simplify it down to just one factor and one response.

So let's think about some of the other brackets within the system

and how they might relate to this response.

One example which occurs quite a lot within this case study

is an example of a mediation variable.

And what we mean by a mediation variable,

is it's a variable which can't exactly be controlled directly,

but has an impact on our final response

while still being affected by one of our factors that we can control.

So in this case, we're going to take a look at crushing strength

as being one of our mediation variables.

And the reason for this being a mediation variable

is because the crushing strength can be changed in this process.

And it does have an effect on the density of the tablet.

But the crushing strength can only be [inaudible 00:08:50]

certain operational range

depending on the water composition within the tablet.

If the tablet's too dry,

we need to change the crushing strength to match that

so it doesn't completely powder the tablet.

If it's too moist, it's got too much water composition in there,

then it might be a much softer crushing.

So in this way, you see that this is a variable

that we can't change directly

because it needs to be affected by the water composition

but it is important somewhat

in the density of the tablet which we'll explore then.

So if we add this into our path diagram in our Structural Equation Model,

you can see that we get this new triangle between each of our variables

with the number one this time now, in the middle.

And what this is doing in the context of the case study,

is it's saying that now we have our direct response

which is the connection

between the H_2O composition and the density of the tablet.

And we also have our indirect response,

which comes from the connection which goes all the way around the diagram

through cr ushing strength and into the density tablet.

Now, what this means is this is actually a two- part equation.

So not only do we have an extra C as before when there was one X variable,

we now also have another variable which is in a middle part of the equation.

So because of that, the first thing that we notice is on the path diagram,

the connection between the H _2O composition

and the density of the tablet is 0.054,

which is already significantly less of a correlation than we saw last time

when we just included the one variable.

And this is because the rest of the component

actually comes from elsewhere as we're exploring here.

So again, we can see that that connection still

is not statistically significant.

And in fact, has an even weaker correlation than former.

Now, if we look at our other connections,

say, between our crushing strength and density of the tablet,

we're seeing a statistically significant correlation

and a reasonably correlated one at 0.593.

So what does this actually mean in the context of this case study?

Well, it means a couple of things

and we draw a couple of conclusions from this.

Firstly, the relationship between the H_2O composition

and the density of the tablet is statistically significant

which isn't something we knew before that was hidden

behind the fact that we only perform the linear regression.

And the second thing that we've learned

is that crushing strength and the density of the tablet

are correlated.

So in a practical sense, this means that we can conclude

if we want to control the density of our tablet,

then this must be done through changing the H_2O composition

but only as a means to alter that crushing strength

to the desired value.

And so if we actually came to optimize this entire situation

in this entire case study,

the crushing strength is where we want to pay attention to

and anything we can do to affect that

which isn't something we can change with just a dial.

So in this way, you can see how SEM has allowed us to uncover relationships

which may be missed in traditional modeling methods.

And in turn, it provides a much deeper understanding of how our system works.

So next time you're exploring the system, I'd encourage you to consider acquiring

SEM to help you better understand

the system both visually and more in depth.