Hi @Gabriel,
When categorical variables are used as "controls" or covariates, we need to follow the same approach as in regression analysis. That is, we create numeric, dummy coded, variables and enter them as predictors of the model.
Here's one example with a mediation model. Suppose we survey consumers about their perceptions of the Privacy and Reputation of an organization, and we also ask them about their intent to make a purchase. We theorize that higher perceptions of Privacy leads to better Reputation, which in turn increases Purchase Intentions. We also believe that Privacy directly impacts Purchase Intentions. The path diagram for this model looks like this:
Simple mediation model
Importantly, we know that some of these consumers only engage with the organization online (through their website/app) while other consumers engage in person. Thus, we want to control for these two ways to engage with the company. We have two levels of the categorical variable (online and in-person) so we only need one dummy variable. We create one labeled "Online," which has a value of 0 for in-person and 1 for online engagement. This variable is numeric/continuous so it can enter the analysis just like all the others. Because we have a mediation model, we can control for it by entering it as a predictor of the other endogenous (outcome) variables. Here's the path diagram that shows this:
Mediation model with a control variable
Notice there's a two-headed arrow (a covariance) between the Online and Privacy variables. This is the standard specification... that is, a covariance indicates that we don't have a hypothesis about the association between the Online and Privacy variables.
With multiple categorical variables to control for, you'd have to create multiple dummy variables and add them in a similar way. Make sure they predict the outcomes where you want to control for them... and make sure they covary with the other exogenous variables (those that don't have anything predicting them).
Dealing with missing values:
One of the many useful features of SEM is its cutting-edge (and effortless) handling of missing data. In the example above, as long as your missing values are identified as such by the data table, then you don't need to do anything... the SEM platform will automatically detect there're missing values and use "full information maximum likelihood" for estimation, which means *all* available data will be used in your analysis. This assumes your data are missing at random (MAR) or missing completely at random (MCAR), which is also an assumption of multiple imputation, for example. To make sure the data table recognizes your missing values as such, they should show up as dots as in this image:
Missing values in data table
If you don't see the dots, go to Cols > Column Info... > Column Properties > Missing Value Codes, so you can specify which values should be recognized as missing.
Standardizing variables:
One way to standardize variables entails right-clicking on the data table column, then New Formula Column > Distributional > Standardize. This can be helpful when one has variables in very different scales. However, if you don't have that problem and simply want standardized estimates, the best approach is to conduct your analysis with the original variables, and then click on the red triangle menu of your fitted model... you'll find an option called "Standardized Parameter Estimates," which will give you what you need.
HTH,
Laura C-S