ABCs of Structural Equation Modeling (2021-EU-45MP-752)

Laura Castro-Schilo, JMP Senior Research Statistician Developer, SAS
James R. Koepfler, JMP Research Statistician Tester, SAS

This presentation provides a detailed introduction to Structural Equation Modeling (SEM) by covering key foundational concepts that enable analysts from all backgrounds to use this statistical technique. We start with comparisons to regression analysis to facilitate understanding of the SEM framework. We show how to leverage observed variables to estimate latent variables, account for measurement error, improve future measurement and improve estimates of linear models. Moreover, we emphasize key questions analysts can tackle with SEM and show how to answer those questions with examples using real data. Attendees will learn how to perform path analysis and confirmatory factor analysis, assess model fit, compare alternative models and interpret all the results provided in the SEM platform of JMP Pro.

Auto-generated transcript...

Speaker	Transcript
Laura Castro-Schilo	Hello, I'm Laura Castro-Schilo and welcome to this session, where we're going to learn the ABC of structural equation modeling.
	And our goal today is to make sure that you have the tools that are needed for specifying and interpreting models, using the structural equations models platform in JMP Pro 16.
	And we're going to do that, first by giving a brief introduction to SEM by just telling you what it is and, particularly, drawing on the connections it has with factor analysis and regression analysis.
	And along the way we're going to learn how path diagrams are essential tools for SEM.
	And we're going to try to keep that introduction fairly brief, so we can really focus on some hands on examples.
	And so, prior to those examples I'm going to introduce the data that we're going to use for that demo and these data are about perceptions of COVID 19 threats.
	And so, looking at those data we're going to start learning about how we specify and interpret our models and we're going to do that by answering specific questions.
	Those questions are going to lead us to talk about very popular models in SEM, one being confirmatory factor analysis and another one, multivariate regression analysis.
	And to wrap it up we're going to show you a model where we bring both of those analyses together, so you really can see how SEM is a very flexible framework where you can fit your own models.
	Okay, so SEM is a framework where factor analysis and regression analysis come together, and on the factor analysis side we're able to gain the ability to measure
	even those things that we cannot observe directly, also known as latent variables.
	And from the regression side we're able to examine relations across variables, whether those are observed, or unobserved and so when when you bring those two together, you get SEM which you can imagine, is a very flexible framework where all sorts of different models can be fit.
	Path diagrams are really useful tools in SEM, and the reason is that the systems of equations that can be, you know, fairly complicated
	can actually be represented, through these diagrams. And so as long as we know how to draw the diagrams and how to interpret them,
	then we're able to use them to our advantage. So here we have rectangles and so those are used exclusively for representing observed variables.
	Circles are used to represent latent variables, and double-headed arrows are used for representing both variances and covariances, and one-headed arrows are for regression or loading effects.
	There's another symbol that's often used in path diagrams and it's sort of outside the scope of what we're going to talk about today.
	But if you come across it, I just want to make sure you know it's there, and that is a triangle. Triangles are used to represent means and intercepts and so there's all sorts of interesting models we can fit,
	where we are modeling the mean structure of the data, but again we're not gonna have time to talk about those today.
	Now, when it comes to path diagrams I think it's useful to think of what are the building blocks
	of SEM models so that we can use those to build complex models. And one of those would be a simple linear regression. So here, you see, we have
	a linear regression where Y is being regressed on X. And notice both X and Y are in these rectangles, in these boxes, because they are observed variables.
	And we're using that one-headed arrow to represent the regression effect and the two-headed arrows that start and end on the same variable
	represent, in this case, the variance of X, and in the case of Y, it's the residual variance of Y. If a double-headed arrow were to start in one variable and start...and end at the other, then that would be a covariance.
	Now, in SEM any variable can be both an outcome and a predictor. So in this case, Y could also take on the role of a predictor if we had a third variable Z, where Y is predicting Z.
	And so we can build sequential effects, this type of sequential regressions, you know, as as many as you need depending on on your data.
	Another building block would be that of a confirmatory factor model, and so that's basically
	the way that we specify latent variables in SEM. And this particular example is a very simple one factor, one latent variable
	confirmatory factor model where the circle represents the unobserved latent variable.
	And notice that latent variable is has one-headed arrows pointing to the variables that we do observe, in this case W, X and Y.
	And the reason why that variable points to those squares is because in factor analysis, the idea is that the latent variable causes the common variability we observed across W, X and Y.
	And this is really important to understand because it's often confused when we think about principal components from like a principal
	components analysis perspective. And so I think this is a good opportunity to sort of draw the distinctions between latent variables from a factor analytic perspective and components from a PCA perspective, so I'm going to take a little bit of a tangent to explain those differences.
	In this image, the squares represent the variables that we measured, those observed variables. And notice, I'm using these different amounts of
	blue shading on those variables to represent the proportion of variance that is due to what we intended to measure, sort of the signal, the things that we wanted to measure with our instruments.
	And the gray shaded areas are the proportion of variance that is due to any other sources of variance. It can include measurement error, but it can also include systematic variance that is
	unique to each of those measurement instruments.
	And so, in the case of factor analysis, the latent variable captures all of that common variability across the observed variables, and so that's why we're using this solid blue to represent the latent variable.
	And that's in contrast to what happens in principal component analysis, where the goal is dimension reduction. And so in PCA, the component is going to explain the maximal amount of variance
	from the dimensions of our data. And so that means that that that principal component is going to be often a combination of the variance that's due to what
	what we wanted to measure, but also to some other sources of variance. All right, and so again the the diagram illustrates also those the causal assumption, the fact that latent variables are
	hypothesized to cause the variability in their indicators in the observed variables, and so that's why those one-headed arrows are pointing toward the observed variables and that's not the case in in PCA.
	Alright, so I think this is a useful distinction to make when we're talking about latent variables in SEM,
	very often, what we're talking about is is the latent variables from a factor analysis perspective.
	Okay, so here I've chosen to show you a path diagram that belongs to a model that's already been estimated. So we have all of the values here
	on these arrows because those are all estimates from the model, and I think that this diagram does a good job at illustrating why one might use SEM.
	First, we see that we have unobserved variables. Right here, conflict is an abstract construct that we can't necessarily observe directly and so we're defining it as a latent variable by leveraging the things that we do observe, in this case we have three survey questions that represent
	you know, that that unobserved Conflict variable. We are also able to account for measurement error, the way in which latent variables are defined in SEM
	assures us that we are, in fact, accounting for measurement error, because those latent variables are only going to sort of capture the common variance across all of these observed variables.
	Also notice that we are able to examine sequential relations in SEM. So we have this unobserved conflict variable but we're also able to see,
	you know, how does this Support variable, how does that influence Work and then, how does this Work variable in turn influence the latent variable?
	And ultimately, how Conflict, the unobserved variable, can predict all sorts of other outcomes. And so these sequential relations are very useful and very easy to estimate in SEM.
	Another good reason to use SEM is that in JMP Pro, our platform uses cutting-edge techniques for handling missing data, so even if you have a simple linear regression and that's really all you need, if you have missing data,
	SEM makes sure that everything that's present is being used for estimation, and so that can be very helpful as well.
	If this...
	if what I've said so far piques your interest and you plan on learning more about SEM, without a doubt, you're going to find a lot of terminology
	that is unique to the field. And so like anything else, there's jargon that we need to become familiar with.
	And so, this diagram is also useful to introduce some of that jargon. First, we've been talking about observed variables or measured variables.
	In SEM those are often called manifest variables. We have latent variables, which we discussed already.
	But we also have this idea of exogenous variables and those are the ones that are only going to predict other variables.
	In our model here, we only have two of those and they are in contrast to endogenous variables. And so every other variable here is an endogenous variable because they have other variables predicting them.
	Alright, so those are endogenous variables. We also have latent variable indicators and so these are the variables that are caused by the latent variables.
	And the residual variance that is not explained by the latent variable is called uniquenesses, and they're often called as well unique factor variances. And so remember that this is the combination of systematic variance that is unique to that variable in addition to measurement error.
	I find it useful when people are learning SEM to kind of have a shift in focus of what
	what the model is really doing. So in other words, by realizing that we're doing multivariate analysis of a covariance structure
	(and also means, but remember that we're not talking about means today), but by realizing that what we're actually analyzing is the structure
	of the covariances of the data, that helps sort of wrap our heads around SEM a lot more easily.
	Because it has implications for what we think the data are, right? So, for example, you know, we can have our data tables, where each row represents a different observation and each column is a different variable.
	And we can definitely use those data to launch the SEM platform in JMP.
	But in the background, sort of behind the curtain, what the platform is doing is looking at the covariance matrix of those variables, and that is, in fact, the data that are being analyzed.
	And so this also has implications for the residuals. Oftentimes when we think about residuals in SEM, those are with respect to that covariance matrix that we're analyzing.
	And this is also true for degrees of freedom, and the degrees of freedom are going to be with respect to this covariance matrix.
	Right, and so I want to make sure that I give you a little taste of how SEM works in terms of its estimation.
	And so the way we start is by specifying a model, and thankfully, in JMP Pro, we have this really great friendly user interface
	where we can specify our models directly with path diagrams, rather than having to specify or list, you know, complex systems of equations. You can simply draw the path diagrams that imply a specific covariance structure.
	And so the diagrams imply a covariance structure for the data and then during estimation, what we do is try to obtain model estimates that match the sample covariance matrix as closely as possible, based on the model implied
	constraints, basically. And once we have those estimates, we can plug them into the model implied covariance matrix and compare those values against the sample covariance matrix,
	and the difference between them allows us to quantify the fit of the model, right. So if we have large residuals, by looking at the difference between these two covariance matrices, then we know that we have not done a very good job at fitting our model.
	Alright, so, in a nutshell that's how SEM works, and I'd like to take now the next part of the presentation to introduce the data that we're going to use for our demo. I do think that
	it's easier to learn new concepts by getting our hands on some real data, real real world examples.
	So the data that we're going to use actually come from a very recently published article that was published in the Social, Psychological and Personality Science journal, and so this was published in the summer of 2020.
	And the authors wanted to answer a very simple question. They said, you know, how do perceived threats of COVID 19 impact well being and public health behaviors? And so
	it's it's a simple question, except for the fact that perceived threats of COVID 19 is a completely new construct, right. It's a very abstract idea.
	You know, what what is perceived threats of COVID 19 and how do you measure that, right? And so, because this is something that has never been measured before, the authors had to engage in a very careful
	study where they developed a survey to be able to measure those threats. And
	developing a survey is not easy, right. We need to make sure that our questions for our survey are reliable, that they're valid.
	And so they had to go through the process and we're going to see how they did that in a minute.
	Now, in their study they found that there's two types of threats that they could measure, one they called realistic threats, and that's
	things that threaten our financial and physical safety.
	And the other type of threat was...they called it symbolic. So those are things that threaten our social, cultural identity, right. And it's also important to say this sample was for the United States population. They sampled
	over 1,000 individuals and so their questions pertain exclusively to the United States population.
	And what we see here, this is actually the integrated COVID 19 threats scale, so this is the questionnaire that they developed after going through three different studies. And so they found that those two threats
	could be measured with a handful of items. They asked their participants to answer how much of a threat, if any, is the coronavirus outbreak, for your personal health, the health of the US population as a whole,
	your personal financial safety and so on. And for symbolic threat, the questions were, you know,
	how much of a threat is the virus for what it means to be an American, American values and traditions, and the rights and freedoms of the United States population as a whole, and so on. So you can see the differences in what these threats represent.
	So we had access to these data and we're going to use those data to answer very specific questions. First, how do we measure these perceptions of COVID 19 threat and we're going to focus on the two threats that they identified.
	And so, this is going to lead us to talk about confirmatory factor analysis and assessing a measurement model to make sure we can figure out if the questions in the survey are, in fact, reliable and valid.
	Notice we're going to skip over this very important first step, which is exploratory factor analysis and that's something that one would do before
	using SEM, right. You would run an exploratory factor analysis and then you come to SEM to confirm the structure of that...of the previous results. The the authors of this article definitely did that but we're going to focus on the steps that we would follow using SEM.
	The second question is, do perceptions of COVID 19 threat predict well being markers and public health behaviors. And so this this question is going to lead us to talk about multiple regression and path analysis within SEM.
	And the last question is are effects of each type of threat on outcomes equal. And this actually allows us to to show a very cool feature of SEM, which involves setting equality constraints in our models and conducting systematic model comparisons to answer these types of questions.
	Alright, so it's time for the demo, and I already have...let's see....
	Oops, how do I get out of here?
	It's not time for questions yet.
	I just want to exit the screen and I can't seem to do it.
	Okay, here we go, so we have...
	I already have the data table from JMP open right here. This...these data, you can see there's 238 columns so that's because the authors asked
	a number of different questions from 550 participants in this case; this is one of their three studies.
	And the first 10 questions, the ten first columns that I have in the data correspond to those 10 questions we saw in their threats scale.
	And so, those are going to be the ones we use first to do a confirmatory factor analysis. And so we're going to click analyze, we'll go to multivariate methods, structural equation models.
	And we are going to use those 10 variables and click model variables, and then we're going to click OK to launch the platform.
	A notice that on the right hand side we immediately see there's a path diagram.
	And that diagram has already, you know, all of the features that we discussed earlier, so each of the variables are in rectangles, suggesting that they're observed variables.
	And each of them have these double-headed arrows that start and end on themselves, so they represent a variance of each of those variables.
	Now, if I right click on the canvas there's a show menu, and notice that the means and intercepts are hidden by default.
	I'm going to click on this just to show you that we do, in fact, have
	means estimated by default from all of these variables. And so we're not going to talk about those, so we're going to keep them hidden, but I do think it's important to know
	that the default model that we start with when we launched the platform is one where all of the variables have variances and means estimated.
	Now, on this
	tab, we have a list tab, and if we click on that we see that we have
	the exact same information that we have in the diagram but in a list form. And so all of the different types of parameters are split based on the type of parameter it is, so we have all of the variances here and all of the means over there.
	Right, we have a status tab and this basically tells us, you know, about the the specific model we have specified right now. It gives us a bunch of useful information about that model.
	We have details about the data, our sample size, the degrees of freedom, and we also have these identification rules. You can click on them if you want to learn a little bit more about them. It gives you a bit...
	A little description to the right.
	But what's really helpful to know is that this icon for the status tab is constantly changing, depending on the changes we we do and depending on the specification of the model we have. And so oftentimes, if we have an advanced application of SEM, this
	icon might be the color yellow and when we have a bad error, some some important mistake, then that icon is going to be an orange with an X, basically indicating that there's there's an error. So it could be very useful to identify mistakes as we are specifying our models.
	Now to the left side of the user interface, we see that we can
	specify the name of our model, so this is very helpful, sort of to keep track of our workflow. And we also have this From and To lists.
	And so, these lists provide a very useful way to link variables, either using a one-headed arrow or a two-headed arrow.
	So here, for example, if I want to link these, I can click that button and very quickly I've drawn a path diagram, right. So it's a very efficient way to specify models.
	And so I'm going to click on reset here just to go back to the model that we had upon launching the platform, but know that the From and To lists are basically ways in which we can draw the diagrams.
	Okay, in this case we have all of the observed variables listed here, but I know that we want to use those variables to to specify latent variables.
	Now, the first five variables here are the ones that correspond to the items in that survey for the realistic threat. And so I'm going to add a latent variable to the model by going down
	to this box down here, where it says Latent1 and I'm going to change the name to Realistic, because I want
	these five variables to be the indicators of a realistic threat latent variable. And so by clicking on this button,
	I immediately get that latent variable specified. And notice, the first loading for this realistic threat latent variable has a 1 in this...
	in this in this arrow, and that basically represents the fact that the parameter is fixed to the value of 1.
	And we do this because we need to set the scale of the latent variable. Without this constraint, we would not be able to identify the model and so
	by default we're going to place that constraint on the first loading of the latent variable, but we also could achieve
	the same purpose if we fixed the variance of the latent variable to 1. So which one we do is really a matter of choice, but as a default, the platform will fix the first loading to 1. Okay, so we have a realistic threats latent variable and the other five
	variables here are the ones that correspond for, you know, to the symbolic threats questions. And so I'm going to select those and click here. I'm going to type Symbolic and I'm going to click the plus button to add that symbolic threat.
	Okay, so we're almost done, but notice that this model here is, is implying that realistic and symbolic threats are perfectly uncorrelated with each other, and that's a very strong assumption. And so we don't want to do that.
	For the most part, most confirmatory factor models allow the latent variables to covariate with each other, and so I'm going to select them here, and I can click this double-headed arrow to link those two nodes.
	But I can also do it directly from the path diagram. So if I right click on the latent variable, I can click on add covariances and
	right there, I can add that covariance. So it's it's a pretty cool way. You can do it with the list, you can do it directly on the diagram,
	whatever is your choice. And so our model is is ready to be estimated, so I'm going to change the name to 2-Factor CFA and we can go ahead and run it.
	And you can see, very quickly, we obtain our estimates and they're all mapped onto the diagram, which is pretty cool. But before we interpret those results, I want to make sure we focus on this model comparison table.
	The reason is that table provides us a lot of information about the fit of the model, and we want to make sure the model fits well before we interpret the results. So
	the first thing to notice here is that we have three models in this...in this table and we only fit one of them.
	So the reason we have three is because the first two models, the unrestricted and independence models, aren't fit by default up on launching the platform.
	And so we fit these models on purpose to kind of provide a baseline for what's a really good fitting model and what's a really bad fitting model, and so we use those as a frame of comparison
	with our own specified models. So let me be a little more specific. For example, the unrestricted model would be a model (I'm going to show you with the path diagram),
	the unrestricted model is one where every variable is allowed to covary with each other, all right. And so notice that
	the Chi square statistic, which is a measure of misfit, is exactly zero, and the reason is because this model fits the data perfectly. Remember our data here really being the covariance matrix of the data right, and so we have zero degrees of freedom because we've specified...
	have zero degrees of freedom because we are estimating every possible variance and covariance in the data.
	So this is the best possible scenario, right. We have no misfit but we're also estimating every possible estimate from the data.
	The other end of the spectrum is a model that would fit really bad and that's what the independence model is. So if I show you with the path diagram,
	here our default model where we only have variance as a means for the data, that is exactly what the independence model is. And
	that is essentially a model where nothing is covarying with anything else, and you can see the Chi square statistic for that model is in fact pretty large, because there's a lot of misfit, right, so it's almost 2000 units, but we do have
	45 degrees of freedom because we're estimating very few things from the data. And so again, these two models basically provide
	the two ends of the spectrum, right. On the one hand side, a really good fitting model and on the other side, a really poor fitting model, and so we're going to be able to use that information to compare our own model against against those.
	So, if we look at our model. Notice the Chi square statistic is not zero, but it is only 147 units, which is a lot less than 2000.
	And we have 34 degrees of freedom, so we do have some misfit. And when we look at the test for that Chi Square, it is a significant Chi square statistics, so it suggests that we have a statistically significant misfit in the data.
	However, the Chi square statistic is influenced by sample size, and, in this case we have 550 observations. And so
	usually, when you have 300 or more observations, it's very important to not only look up the Chi square statistic, but also at some special
	fit indices that are unique to SEM that allows us to quantify the fit of the model, and that's what the values are over to the right here.
	This first fit index is called the comparative fit index and that
	index ranges from zero to one, so you can see the unrestricted model has a one. That's the best fitting model and the independence model has zero, because the worst fitting model, alright, and our model actually has a CFI of .93, about .94.
	And so that represents the proportion of improvement from the independence model. So another way to say that is
	our model fits about 94% better than the independence model does, so that's actually pretty good. And usually we want to CFI values of .9 or higher. The closer to one, the better.
	Now the root mean square error of approximation is another fit index, but that one,
	although it also ranges from zero to one, we want very, very low values in that index. So notice the unrestricted model has a value of zero
	and the independence model is .27. We usually want values here that are .1 or lower for acceptable models. And ours has a .07, about .08, and that's actually pretty good.
	We also have some confidence intervals for this particular estimate, and you can see that those values are also below .1, so this is a good fitting model, right. And so once we know that the model has...fits our data well, then we can go ahead and interpret it.
	Now, as a default in our estimates, we are going to show you the unstandardized parameter estimates.
	But for factor analysis, it's much more useful to look at the standardized solution so I'm going to right click on the canvas and I'm going to show estimates standardized.
	And so, now the values here are in a correlational metric so we want those values to be as close to one as possible,
	because they represent the correlation of the observed variable with the latent variable. And notice, both for realistic and symbolic threat, the values are pretty good.
	We don't want them to be any lower than about .4, and so these values are good. Another thing that is very unique and really useful, it's unique to JMP Pro, is that the variables here the...any variable that's endogenous that has
	predictors, right, pointing at them, it's shaded.
	Notice here there's a little bit of gray inside these squares, and so that shading is proportional to the amount of variance explained by the predictors.
	And so it allows us to visually see very quickly which variables were doing a really good job at explaining their variance. In this case, it seems like these three variables
	are filled the most with that darker gray, suggesting that the symbolic threats latent variable is doing a pretty good job at explaining the variance of these three observed variables.
	We also see that the two latent variables are correlated about .4, which is is an interesting finding.
	And there's all sorts of output that we could focus on here in the red triangle menu, but I'm going to focus specifically on one
	option called assess measurement model. And this is where we're going to find a lot of statistics where we can quantify the reliability and the validity of our constructs.
	So if we click there, we have this nice little dashboard. And the first information we have here is
	indicator reliability, so this quantifies the reliability of each of the questions in that survey and we provide a plot that is.
	showing us all of these values. And notice, we have a red line here for for a threshold of what we hope to have, right. We want to have at least that much reliability in each of our items. Now,
	you know, these types of thresholds need to be interpreted, you know, with our own critical thinking, because obviously,
	you know, this this particular item, for example, is is below the threshold, but it's still pretty close to the threshold so we're not going to throw it out. We can still consider it
	relatively reliable and and it's still a good indicator of this latent variable. But again, just interpret the thresholds here with caution.
	But one thing that is apparent from this plot is that the symbolic threats latent variable appears to have more reliable indicators than the realistic threats. They're both pretty good, though, but the symbolic one, you know, we're doing a better job of measuring that.
	The values to the right are reliability coefficients for the composites. In other words, they quantify the reliability of the latent variable as a whole and there's two types of reliability.
	I'm not going to get into the details of their differences but notice these values range from zero to one and we want them to be as close to one as possible. And we also provide
	you know, some plots with the threshold of sort of indicating what's the desired amount of reliability that we want, the minimum and, in this case, both realistic and symbolic threat have good reliabilities.
	And the other visualization we have here is for a construct validity matrix and
	keep in mind that when you're trying to measure something that you don't see directly, it's very hard to figure out if it really
	is what you intend it to measure. Are you really measuring what you wanted? And that's what this information allows us to determine.
	The the visualization here is portraying the upper triangular of this matrix, and let me just explain briefly what the values represent. In this lower...
	the below the diagonal, we have the correlation between the latent variable. That's about .4. The diagonal entries represent the average amount of variance extracted
	from the...that the latent variables extract from their indicators. And so you want those values to be as high as possible.
	And above the diagonal we have this squared correlation. In other words, it's the amount of overlapping variance between the latent variables.
	And so the key to interpreting this matrix is we want values in the diagonal to be higher
	than the values above and to the right of the diagonal, and notice here, the visualization makes it very easy to see that we do, in fact,
	have larger values in the diagonal than we have above or to the right. And that is good evidence of construct validity.
	And so, everything here is suggesting that both the realistic and symbolic threats are, in fact,
	latent variables that that are valid, that are reliable, and the survey seems to do a good job of measuring both of these.
	So a next step might be, we could choose perhaps to grab all of those five questions that represent the realistic threats here, and we could create an average across all of these.
	And all of a sudden, we would have one measure that represents realistic threats. We could do that and we could do the same for the other five variables that represents symbolic threats.
	And so let's just for illustration, I actually have already created those variables, so let's go to analyze, multivariate methods, structural equation models. And I'm going to look for those
	average variable. So that realistic and symbolic threats here, these are the average across the columns for each of these variables.
	And I'm going to model those in addition to...we have a measure for anxiety, we also have a measure for negative affects or negative negative emotions.
	And lastly, we have a measure for adherence to public health behaviors, and so we're going to model that as well, and we're going to click OK to launch the platform.
	The diagram buttons here, we can go into a customized menu, we can change all sorts of aspects of the diagram, which is really, really great. Right now, I'm just going to focus on increasing the width of...
	the width of these
	variables, so that we can read what's inside the nodes. And what I'm going to do is fit a model where both realistic and symbolic threats are going to be predictors of these interesting outcomes, right. There's sort of
	markers for, you know, anxiety, negative affect, and also the public health behavior, so we're going to link these with a one-headed arrow to specify the predictions. So we're going to investigate whether these effects are, in fact, significant.
	Now notice I'm not fully done specifying this model yet, because in this particular model,
	there's no connection between the realistic and the symbolic threats, and that would be a very strong constraint in the model to say that these two things aren't
	covarying at all. And so we always want to make sure that we include covariances between our predictors and also between the the residual variances of our outcomes.
	And so we could specify those directly from the From list, in this case I'm going to use add covariances from this menu, and I'm going to link the realistic and symbolic threats.
	I'm also going to use the lists to add covariances between the residuals of these outcomes. And now we have a full correctly specified model, and this is often called path analysis but it's also...
	it's basically a collection, a simultaneous collection of regression models and so we're going to run that.
	And notice from the model comparison table that our model has no degrees of freedom and has a perfect Chi Square and the Chi square is zero.
	But essentially by having zero degrees of freedom, it means that our model is not testable,
	because we've extracted all the information we could have extracted from our data. So that's essentially what we do when we fit a regression model. So there's no problem with that, but just know that you can't interpret this Chi Square and say, oh my model fits so well.
	It fits well because you've extracted everything you could have extracted from the model. So anyhow, it's just like a regression model.
	Alright, so if we go and look at the results, you know, there's all sorts of really important information that we can interpret but I'm going to focus on a couple of things. First,
	notice our diagrams are fully interactive, which is really, really cool. And I'm just moving things around to focus on a couple of effects. I'm going to hide the variances and the covariances in this model, so that we can really focus on the results for from the regression models,
	from the path analysis. And notice here, so realistic and symbolic threats, both of them have a positive effect on anxiety.
	So that's really interesting and here the arrows are solid because the effects, you can see here in the table of parameter estimates, are statistically significant. So if they were insignificant,
	actually, the arrows would show up as dashed arrows. So the path diagram conveys a lot of information. So we have positive significant effects on anxiety
	and that's interesting, of course, but so far, all we've done, again, is is fit regression models in a simultaneous way.
	And, in fact, if we go back to the data table, I have a script here from fit model, where I actually use that same anxiety
	outcome and the same two predictors, realistic and symbolic threats,
	and I simply estimate a multiple regression model. And the reason I wanted to show you this is because, notice the parameter estimates here are exactly the same value than we obtained from SEM.
	And that's no surprise, because, in fact, we are doing a simultaneous regression. So up until this point, you might wonder
	what does SEM buy you, right, because technically, you could run three separate fit models with these three outcomes and you could obtain the same information that we've obtained so far.
	However, well if you have missing data, you still want to use SEM because then we're going to use...all of the data are going to be used rather than dropping rows.
	However, if you want to use SEM you're also going to be able to answer additional questions that are pretty interesting.
	In this case, we might wonder whether the effect that realistic threat has on anxiety is statistically greater
	than the effect that symbolic threat has on anxiety, right. So far, we know that they're both significantly different from zero but are they significantly different from each other?
	And that is a question that we can answer by using the SEM platform. And going back to our model specification panel, we can select both of those effects and up here in the Action buttons we have the set equal button, so if we press that, notice we get a little
	label here that implies that both of these effects are going to be set to equal. They're going to be estimated as one. And so if we change the name here to equal effects and we run this model, we're going to obtain the fit statistics for that specific
	model that has the equality constraints. And notice we've now gained one degree of freedom, so all of a sudden, we have a testable model.
	And we can use the model comparison table to select the two models that we want to compare against each other, and clear your click compare selected models.
	And now we obtain a Chi square difference test, so we're able to compare the model statistically and see what is the amount of misfit that our
	equality constraint induces in the model. And here we can see it's it's about 8.6 units in the Chi square metric and the p value for that is, in fact, significant. So this suggests that
	setting that equality constraint induces a significant amount of misfit in the model. And we also, because we know Chi square is influenced by by sample size, we also have the difference in those
	fit indices that we discussed and for the CFI, we usually don't want this to increase, any more than .01
	at the most. In fact .01 or higher is is not so good. And for RMSEA, you know, you don't want this to be any higher than .1.
	So all of the evidence here suggests that setting the equality constraints leads to a significantly worse fitting model.
	In other words, if we go back to the model that fit best, we're now able to say, based on that Chi square difference test,
	that the effect that realistic threat has on anxiety is significantly higher than the effect the symbolic threat has on anxiety.
	And so those types of questions, you know, we could address them with other parts of this model, but again SEM affords a lot of flexibility by allowing us to compare the equality of different effects within the model.
	Okay, and so, in the interest of time I'm going to close this out, but, and I do want to show you so far, you know we saw a confirmatory factor model. We also saw
	a path analysis, where we're doing a multivariate regression analysis, but we can actually use both of those concepts in one model. And so I have a script that I've already saved in my data table, and you can see what I'm doing here in this model is actually
	estimating latent variables. I'm modeling latent variables for both symbolic and realistic threats, using the original items from the survey, from the questionnaire.
	And so, by doing this, instead of creating averages across the columns,
	I'm actually going to model the latent variables, and that allows me to obtain regression effects, all of these effects amongst latent variables are going to be unbiased and unattenuated by measurement error, because I'm obtaining a more
	a more valid, a more pure measure of symbolic and realistic threats. And so here we are estimating, you can see sequential relations, and my model here is a lot more complex.
	I'm not going to get into the details of the model, but just know that by modeling latent variables and looking at the relations between latent variables, we're really able to obtain the best
	functionality from SEM because our associations between those latent variables are going to be better estimates of...
	for the model. And I actually estimated this model, you can see the results down here. And so notice here, there's a few edges that are...
	that have arrows that are sort of dashed, indicating that those effects are not significant. We also see how powerful the visualization of the shading is, right. We're able to explain
	some proportion of variance of adhering to public health behaviors. And it seems like we're doing a better job of explaining variance on on positive affect than we are on any of the other outcomes here. And so again it's
	basically, the best of both worlds, being able to specify our latent variables but also model them directly using our platform.
	And so, with that I'm going to stop the demo here, but I will direct you to the fact that in the JMP Community website, we have supplementary materials that
	James Koepfler has created. They are really great materials that have a lot of tips on how to interpret our models, how to use the model comparison table,
	and basically all the notes that you would have wanted to take during this presentation, you can get them on the supplementary materials. And so with that I am ready to open it up for questions.

0 Comments

Presented At Discovery Summit Europe 2021

Presenter

Laura Castro-Schilo

ABCs of Structural Equation Modeling (2021-EU-45MP-752)

Presenter

Files

Advanced Statistical Modeling

Basic Data Analysis and Modeling

Consumer and Market Research

Design of Experiments

Predictive Modeling and Machine Learning

Quality and Process Engineering

Reliability Analysis

Sharing and Communicating Results