cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
Gabriel
Level III

Categorical variables in structural equation models

First, I am new to using SEM. I see that on the JMP platforms (JMP 19) there is no way to add categorical variables as controls within the SEM analysis. I have more than one categorical variable column 

Is there a way to do this and I am not just aware of it? Also, is there a way to did with Nans, and standardize the continuous variables?

Gabriel Mulero
5 REPLIES 5
LauraCS
Staff

Re: Categorical variables in structural equation models

Hi @Gabriel,

When categorical variables are used as "controls" or covariates, we need to follow the same approach as in regression analysis. That is, we create numeric, dummy coded, variables and enter them as predictors of the model.

Here's one example with a mediation model. Suppose we survey consumers about their perceptions of the Privacy and Reputation of an organization, and we also ask them about their intent to make a purchase. We theorize that higher perceptions of Privacy leads to better Reputation, which in turn increases Purchase Intentions. We also believe that Privacy directly impacts Purchase Intentions. The path diagram for this model looks like this:

Simple mediation modelSimple mediation model

Importantly, we know that some of these consumers only engage with the organization online (through their website/app) while other consumers engage in person. Thus, we want to control for these two ways to engage with the company. We have two levels of the categorical variable (online and in-person) so we only need one dummy variable. We create one labeled "Online," which has a value of 0 for in-person and 1 for online engagement. This variable is numeric/continuous so it can enter the analysis just like all the others. Because we have a mediation model, we can control for it by entering it as a predictor of the other endogenous (outcome) variables. Here's the path diagram that shows this:

Mediation model with a control variableMediation model with a control variable

Notice there's a two-headed arrow (a covariance) between the Online and Privacy variables. This is the standard specification... that is, a covariance indicates that we don't have a hypothesis about the association between the Online and Privacy variables.

With multiple categorical variables to control for, you'd have to create multiple dummy variables and add them in a similar way. Make sure they predict the outcomes where you want to control for them... and make sure they covary with the other exogenous variables (those that don't have anything predicting them).

Dealing with missing values:

One of the many useful features of SEM is its cutting-edge (and effortless) handling of missing data. In the example above, as long as your missing values are identified as such by the data table, then you don't need to do anything... the SEM platform will automatically detect there're missing values and use "full information maximum likelihood" for estimation, which means *all* available data will be used in your analysis. This assumes your data are missing at random (MAR) or missing completely at random (MCAR), which is also an assumption of multiple imputation, for example. To make sure the data table recognizes your missing values as such, they should show up as dots as in this image:

Missing values in data tableMissing values in data table

If you don't see the dots, go to Cols > Column Info... > Column Properties > Missing Value Codes, so you can specify which values should be recognized as missing.

Standardizing variables:

One way to standardize variables entails right-clicking on the data table column, then New Formula Column > Distributional > Standardize. This can be helpful when one has variables in very different scales. However, if you don't have that problem and simply want standardized estimates, the best approach is to conduct your analysis with the original variables, and then click on the red triangle menu of your fitted model... you'll find an option called "Standardized Parameter Estimates," which will give you what you need.

HTH,

 

 

Laura C-S
LauraCS
Staff

Re: Categorical variables in structural equation models

I should also add that using categorical variables in SEM with "multiple-group analysis" is fairly common and useful! If you're interested in learning more about that, please see this discussion: https://community.jmp.com/t5/Discussions/Categorical-Variables-and-SEM/td-p/930889 

 

Laura C-S
Gabriel
Level III

Re: Categorical variables in structural equation models

Thank you so much @LauraCS. I was able to start the process. Just a little unsure how to link the covariances in the case of having a number of categorical variables (green box in attached image). See my initial try result. The CFI and the RMSEA show there's something not accounted for. What could I be missing? And the logic of the covariance is confusing also because some of my continuous variables could be covaried.Screenshot 2026-02-24 190410.png

 

Gabriel Mulero
LauraCS
Staff

Re: Categorical variables in structural equation models

Thanks for sharing your model @Gabriel!  I recommend rearranging the variables in the diagram so the whole thing "flows" from left-to-right. I made a toy example here:

Path model flowing left-to-rightPath model flowing left-to-right

 

When the variables are displayed like this, we can identify the different "set" of variables that *usually* need to covary (some exceptions to the rule arise with experimental data, where the design of the experiment assures zero covariance). You should add covariances within each of these sets. The easiest way is to select the set of variables on the diagram, right-click over any of the selected variables (note: the right-click menu changes if you click over arrows or the blank area of the diagram!), and click "Add Covariance(s)" --so all the corresponding covariances will be added in one move. Select the next set and repeat. In my toy example, "Reput_1" and "Privacy_4" are alone in their set, so these don't need any covariances added. The properly specified model should look like this:

Path model with covariancesPath model with covariances

It's worth clarifying that the double-headed arrow between "Privacy_3" and "Trust_3" represents a *residual* covariance as these are endogenous (outcome) variables.

Of course, one could omit any of the covariances displayed above if we have a strong hypothesis suggesting said covariance(s) should be zero.

HTH,

 

Laura C-S
Gabriel
Level III

Re: Categorical variables in structural equation models

Thank you, @LauraCS. Maybe a sample data from me would be better, if you don't mind. 

The controls in my experiments are CO2, Gen and Water. DAS are date variants. VPD, Wind and PAR are not affected by these factors, although may change by DAS since they are environmental factors. gsw predicts DeltaT, SWC predicts DeltaT, SWC predicts SoilT, SoilT predict DeltaT. LAI and height predict gsw, and both also predict SWC, SoilT, and DeltaT.
The environmental factors affect or predict gsw and DeltaT.

Thank you.

Gabriel Mulero

Recommended Articles