Discussions

Gabriel · Feb 24, 2026 02:32 AM

First, I am new to using SEM. I see that on the JMP platforms (JMP 19) there is no way to add categorical variables as controls within the SEM analysis. I have more than one categorical variable column

Is there a way to do this and I am not just aware of it? Also, is there a way to did with Nans, and standardize the continuous variables?

Gabriel Mulero

LauraCS · Feb 24, 2026 10:21 AM

Hi @Gabriel,

When categorical variables are used as "controls" or covariates, we need to follow the same approach as in regression analysis. That is, we create numeric, dummy coded, variables and enter them as predictors of the model.

Here's one example with a mediation model. Suppose we survey consumers about their perceptions of the Privacy and Reputation of an organization, and we also ask them about their intent to make a purchase. We theorize that higher perceptions of Privacy leads to better Reputation, which in turn increases Purchase Intentions. We also believe that Privacy directly impacts Purchase Intentions. The path diagram for this model looks like this:

Simple mediation model

Importantly, we know that some of these consumers only engage with the organization online (through their website/app) while other consumers engage in person. Thus, we want to control for these two ways to engage with the company. We have two levels of the categorical variable (online and in-person) so we only need one dummy variable. We create one labeled "Online," which has a value of 0 for in-person and 1 for online engagement. This variable is numeric/continuous so it can enter the analysis just like all the others. Because we have a mediation model, we can control for it by entering it as a predictor of the other endogenous (outcome) variables. Here's the path diagram that shows this:

Mediation model with a control variable

Notice there's a two-headed arrow (a covariance) between the Online and Privacy variables. This is the standard specification... that is, a covariance indicates that we don't have a hypothesis about the association between the Online and Privacy variables.

With multiple categorical variables to control for, you'd have to create multiple dummy variables and add them in a similar way. Make sure they predict the outcomes where you want to control for them... and make sure they covary with the other exogenous variables (those that don't have anything predicting them).

Dealing with missing values:

One of the many useful features of SEM is its cutting-edge (and effortless) handling of missing data. In the example above, as long as your missing values are identified as such by the data table, then you don't need to do anything... the SEM platform will automatically detect there're missing values and use "full information maximum likelihood" for estimation, which means *all* available data will be used in your analysis. This assumes your data are missing at random (MAR) or missing completely at random (MCAR), which is also an assumption of multiple imputation, for example. To make sure the data table recognizes your missing values as such, they should show up as dots as in this image:

Missing values in data table

If you don't see the dots, go to Cols > Column Info... > Column Properties > Missing Value Codes, so you can specify which values should be recognized as missing.

Standardizing variables:

One way to standardize variables entails right-clicking on the data table column, then New Formula Column > Distributional > Standardize. This can be helpful when one has variables in very different scales. However, if you don't have that problem and simply want standardized estimates, the best approach is to conduct your analysis with the original variables, and then click on the red triangle menu of your fitted model... you'll find an option called "Standardized Parameter Estimates," which will give you what you need.

HTH,

Laura C-S

LauraCS · Feb 24, 2026 10:22 AM

I should also add that using categorical variables in SEM with "multiple-group analysis" is fairly common and useful! If you're interested in learning more about that, please see this discussion: https://community.jmp.com/t5/Discussions/Categorical-Variables-and-SEM/td-p/930889

Laura C-S

Gabriel · Feb 24, 2026 12:07 PM

Thank you so much @LauraCS. I was able to start the process. Just a little unsure how to link the covariances in the case of having a number of categorical variables (green box in attached image). See my initial try result. The CFI and the RMSEA show there's something not accounted for. What could I be missing? And the logic of the covariance is confusing also because some of my continuous variables could be covaried.

Gabriel Mulero

LauraCS · Feb 24, 2026 02:17 PM

Thanks for sharing your model @Gabriel! I recommend rearranging the variables in the diagram so the whole thing "flows" from left-to-right. I made a toy example here:

Path model flowing left-to-right

When the variables are displayed like this, we can identify the different "set" of variables that *usually* need to covary (some exceptions to the rule arise with experimental data, where the design of the experiment assures zero covariance). You should add covariances within each of these sets. The easiest way is to select the set of variables on the diagram, right-click over any of the selected variables (note: the right-click menu changes if you click over arrows or the blank area of the diagram!), and click "Add Covariance(s)" --so all the corresponding covariances will be added in one move. Select the next set and repeat. In my toy example, "Reput_1" and "Privacy_4" are alone in their set, so these don't need any covariances added. The properly specified model should look like this:

Path model with covariances

It's worth clarifying that the double-headed arrow between "Privacy_3" and "Trust_3" represents a *residual* covariance as these are endogenous (outcome) variables.

Of course, one could omit any of the covariances displayed above if we have a strong hypothesis suggesting said covariance(s) should be zero.

HTH,

Laura C-S

Gabriel · Feb 24, 2026 03:21 PM

Thank you, @LauraCS. Maybe a sample data from me would be better, if you don't mind.

The controls in my experiments are CO2, Gen and Water. DAS are date variants. VPD, Wind and PAR are not affected by these factors, although may change by DAS since they are environmental factors. gsw predicts DeltaT, SWC predicts DeltaT, SWC predicts SoilT, SoilT predict DeltaT. LAI and height predict gsw, and both also predict SWC, SoilT, and DeltaT.
The environmental factors affect or predict gsw and DeltaT.

Thank you.

Gabriel Mulero

Discussions

Categorical variables in structural equation models

Re: Categorical variables in structural equation models

Re: Categorical variables in structural equation models

Re: Categorical variables in structural equation models

Re: Categorical variables in structural equation models

Re: Categorical variables in structural equation models

Recommended Articles