Hi lujc07,
Resources for SEM:
I highly recommend seeking out some of Laura's (@LauraCS) resources and posts as a good starter for using SEM in JMP.
A good one to watch is: https://community.jmp.com/t5/Discovery-Summit-Americas-2020/ABCs-of-Structural-Equations-Models-2020...
Additionally, there are a number of good SEM texts out there, I think a good starter is "Principles and Practice of Structural Equation Modeling" by Rex Kline, it has an applied bent to it. Other authors you'll see are Ken Bollen, Schumacker & Lomax, Rick Hoyal, and many others; all their texts are great resources as well and some will be more technical than others. Most of the texts will likely use examples from education and social science's but the approaches apply generally. The technique is simply used widely in those areas.
Re: Q2: I think it'd generally be considered most appropriate for SEM to follow Exploratory Factor Analysis (EFA / FA) as opposed to some of the other dimension reduction techniques you listed (PCA, clustering, etc.). However, if you're reading literature, you might see other ones like PCA being used prior. A useful ordering in your head for thinking about it might be Correlation Matrix -> EFA -> CFA -> SEM.
Re: Q1: Steps 1-6 are a good general set of steps for moving into SEM. In fact, I think sometimes people forget these important earlier steps and just try to build a model right away which can lead to headaches. A couple of suggestion I have:
Step 5:
run factor analysis for each group. In each group, I selected 2-3 variables that have high and similar loadings in the same factor as indicators for a latent variable (this step tends to be subjective).
In your FA step I would consider including all the variables (v1-v9 etc) associated with your latent variables "water, air, and habitat" in the FA (you possibly did this but I wasn't quite sure). The reason is, it will help you identify those variables that may cross load with these other factors and help you decide if you want to include them or not. An item may look great when it’s on its own in a small subset but when it’s included with the other variables it may not be one that you want to retain. It will also help you see if these factors are separating out as you'd expect.
Additionally, use at least 3 variables for a latent variable (good rule of thumb) and if you have more than that, that are appropriate, and load well onto a factor (such as 4 or 5) bring those along as well no need to toss them out if they measure the construct well. If you're in the position of having enough data just note it is ideal if you can do EFAs on part of the data (a test set) and then follow-up with CFAs on another set of data (a validity set). The factor structure for the EFAs is going to be driven by the data and following it up with CFAs on a new set of data will help you identify if the item parameters and factors replicate well.
Step 6:
- 6. run confirmatory factor analysis with selected variables (see diagram below). If pass (based on fit indices, indicator reliability, composite reliability, construct maximal realiability, construct validity matrix), use these latent variables to build structural equation model with observed variables (human development, animal group). If not pass, change the combination of variables until pass.
After you've settled on your variables for your latent variables (using all the information you noted in your post) and the observed variables you want to use then think about the different SEMs you want to test. The goal of SEM is to use theory or more generally your knowledge of an area to test out multiple competing models. A competing model is one that differs in its parameterization. For a simple example, if you removed the path between "Human Development 2" and "Habitat" that would result in a new, slightly simpler model that you could test (a nested model). The model differs by 1 parameter and would result in a new set of fit indices and results. If that model fits practically as well as the model that included that path we might conclude there is statistical evidence for it being a more parsimonious model that fits just as well as a more complex one. You can expand on this idea and run multiple models until you settle on your model that exhibits good fit and is reasonable for your theory. When modifying your model to try other models, I recommend being intentional with your changes to paths, and testing things that would make sense to you and the theory your interested in.
In SEM, you’re typically testing models (via changing the paths) against each other to see how well they recreate the means and covariance structure of the data relative to how complex the model is. This approach is a little different from something like regression where you may be interested in finding a subset of variables that best predict an outcome or maximize R squared.
I hope this helps. The resources I provided will have considerably more depth than what I was able to provide here.