Hi, I'm Jordan Walters,
I'm a Technical Intern at JMP
and I'm going to present to you today
about Structural Equation Modeling
and how we can use this in linear aggression
to actually find relationships
with otherwise might not already be obvious.
So to do this, we're going to have a look
at a case study about manufacturing of pharmaceutical tablets.
Within this case study, there's lots of factors and responses
that need monitoring and we need to try and determine the relationship
between these factors and these responses to try and work out how the system works.
And in doing this, this is going to allow us to find
the optimal conditions for our tablet to make the best tablet possible.
But to make this entire case study a lot simpler,
rather than focusing on all of the responses and all of the factors,
we're just going to focus on one response and one factor.
So in this case,
the one factor we're looking at is the percent composition of water
and the one response that we're going to look at
is the density of the tablet.
And that's going to be the metric of whether or not
we have the tablet or not.
So just for reference,
the tablet density is in milligrams per centimeter cubed.
So typically, how we define the relationship
between this response and factor
is just through a simple linear regression
which from the name,
gives us a linear relationship between the response and factor.
And this is what we need to fully understand the system.
But the problem with linear regression is it's only going to allow us to find
the direct effects between changing
the water composition and the density of the tablet.
It's not going to let us find
anything else about the system or any underlying features.
And so because of this,
it might actually fail to give an impression for the system.
And this is where Structural E quation M odeling
can actually come in to try and give us a more comprehensive view of the system
which is going to allow us to find different relationships
which maybe more support and not need for obvious .
So a bit of background on linear regression,
in its simplest form, linear regression
is just a linear relationship between one variable to another.
And since linear regression is by definition, linear,
this is going to be connected by a linear equation.
In this equation in this case, is Y equals mX plus C.
Well, Y is the Y response,
X is the X factor, m is our radiant, and C is our offset.
So the m and the C are just constants that we're going to find
to fill this regression equation
so that we can relate on Y on X variables.
So to do this in the context of the case study,
we're going to plot the X and Y variables
to try and come up with this regression equation in JMP.
And JMP can very simply do this for us.
It gives us a lot of information,
some of which we don't need for performing this linear regression.
But the two pieces of really important information that we get
is the regression coefficients which is our m and our C,
which are about here,
and the regression equation
which is the full form of our linear regression.
Now JMP does give this to us in the form of Y equals mX plus C
but for the purposes of clarity, and to see how this works
in our case study is presented here as a proper relationship
being that the density is equal to 0.11 7 percent of water plus 95. 903.
And this is reported above with your m and C values.
Well, you might simply see that all this relationship means
is that for every one unit increase in percentage to L,
we get a one unit increase in density.
But we have that C value at the end which offsets our density by 95 .93.
The purposes of the C value is to give a scale to our data.
Without that scale, what we have is a relationship
between density and percent of composition of H_2O.
Adding this constant at the end gives us that scale
to actually be the units that we need to study within this case study.
So now that we've got an idea of how we traditionally
go about looking this problem,
let's try and start building the Structural Equation Model.
So to transform traditional linear regression
into the form of a Structural Equation Model,
we need to do a few things.
We begin by moving this relationship
away from its graphical form, into this visual form.
Well, we can see that we've got our X factor
in the rectangle being linked to the other rectangle
which is our response.
And between these two rectangles, they are linked by this arrow.
Now, a single- headed arrow just means a relationship going one way.
And then Structural Equation Modeling, it is possible to have double- headed arrow
but that's not particularly important in this example.
What the single- headed arrow means
is that H_2O composition affects the density of the tablet
but the density of the tablet does not affect the H_2O composition.
And this allows us in this instance,
fill this individual X and a Y perform linear regression.
Now the confusing part about this graph
is probably the one in the triangle at the top.
And simply, all this does, and this is usually hidden
within a structural equation model,
is it allows us to set this scale for our entire model.
And we can see that through
how this Structural Equation M odel actually reports the data.
So if you look across the axis
between the one in the triangle and the Y response, the density tablet,
you'll see that we actually get a number on that,
which is 95. 903 which we've seen before.
It's actually our C value from our linear regression.
And this further proves that this is the part of the graph
that is used in this given scale.
So if we can see where the C comes from
and that directly translates where does the m coming from.
Well, the m is actually from a connection at the bottom there
between the percentage to a composition to the density of the tablet.
And we can see that it's not 0.117, the exact same as we found.
Bear in mind here that both of these analysis performed
in two different platforms.
Now, we might have two parts of our Y, it's in the C equation there.
But this schematic is in the form of three sides.
And so you might be wondering what is that final side?
And simply put, the left hand side,
which is essentially the number one and %H2_O Composition,
is just the X and the SEM.
Similar to how on the C value,
which gives us the offset of the Y goes into the C graph,
is the Y- intercept of the graph
which is typically important and quite useful statistically.
We do also get this X value given to us by the Structural Equation Model,
which isn't particularly important
because we don't have a good look at the X-intercept
but it's interesting that the analysis can provide us
with this additional information without really asking for it.
This is all computed automatically like I say.
And then typically, there's top parts hidden,
of course, just to give us context as to what's actually happening here.
And so looking at what this we've actually got here,
we've got these solid lines
which denote that something is statistically significant.
And between the H_2O composition and the density of the tablet,
we have this dashed line which actually denotes
that that is not a statistically significant correlation.
And the fact the correlation is very low, with 0.117
implies it's not that well correlated anyway.
So if we just took this at place value,
you might be inclined to agree with our initial linear regression
and say there's no effect
from H_2O composition on density of the tablet.
But since we've already got on data
in the form of the Structural Equation Model,
let's continue with the exploring to day
and see if we can cover any hidden relationships.
So we know that this entire case study
is built up of lots of X factors, lots of Y responses,
and we're sure that we simplify it down to just one factor and one response.
So let's think about some of the other brackets within the system
and how they might relate to this response.
One example which occurs quite a lot within this case study
is an example of a mediation variable.
And what we mean by a mediation variable,
is it's a variable which can't exactly be controlled directly,
but has an impact on our final response
while still being affected by one of our factors that we can control.
So in this case, we're going to take a look at crushing strength
as being one of our mediation variables.
And the reason for this being a mediation variable
is because the crushing strength can be changed in this process.
And it does have an effect on the density of the tablet.
But the crushing strength can only be [inaudible 00:08:50]
certain operational range
depending on the water composition within the tablet.
If the tablet's too dry,
we need to change the crushing strength to match that
so it doesn't completely powder the tablet.
If it's too moist, it's got too much water composition in there,
then it might be a much softer crushing.
So in this way, you see that this is a variable
that we can't change directly
because it needs to be affected by the water composition
but it is important somewhat
in the density of the tablet which we'll explore then.
So if we add this into our path diagram in our Structural Equation Model,
you can see that we get this new triangle between each of our variables
with the number one this time now, in the middle.
And what this is doing in the context of the case study,
is it's saying that now we have our direct response
which is the connection
between the H_2O composition and the density of the tablet.
And we also have our indirect response,
which comes from the connection which goes all the way around the diagram
through cr ushing strength and into the density tablet.
Now, what this means is this is actually a two- part equation.
So not only do we have an extra C as before when there was one X variable,
we now also have another variable which is in a middle part of the equation.
So because of that, the first thing that we notice is on the path diagram,
the connection between the H _2O composition
and the density of the tablet is 0.054,
which is already significantly less of a correlation than we saw last time
when we just included the one variable.
And this is because the rest of the component
actually comes from elsewhere as we're exploring here.
So again, we can see that that connection still
is not statistically significant.
And in fact, has an even weaker correlation than former.
Now, if we look at our other connections,
say, between our crushing strength and density of the tablet,
we're seeing a statistically significant correlation
and a reasonably correlated one at 0.593.
So what does this actually mean in the context of this case study?
Well, it means a couple of things
and we draw a couple of conclusions from this.
Firstly, the relationship between the H_2O composition
and the density of the tablet is statistically significant
which isn't something we knew before that was hidden
behind the fact that we only perform the linear regression.
And the second thing that we've learned
is that crushing strength and the density of the tablet
are correlated.
So in a practical sense, this means that we can conclude
if we want to control the density of our tablet,
then this must be done through changing the H_2O composition
but only as a means to alter that crushing strength
to the desired value.
And so if we actually came to optimize this entire situation
in this entire case study,
the crushing strength is where we want to pay attention to
and anything we can do to affect that
which isn't something we can change with just a dial.
So in this way, you can see how SEM has allowed us to uncover relationships
which may be missed in traditional modeling methods.
And in turn, it provides a much deeper understanding of how our system works.
So next time you're exploring the system, I'd encourage you to consider acquiring
SEM to help you better understand
the system both visually and more in depth.