cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Specifying and Fitting Models

Published on ‎11-07-2024 03:32 PM by Community Manager Community Manager | Updated on ‎11-07-2024 05:42 PM

JMP offers many linear modeling options. See the available Fit Model capabilities and learn when and how to use the ones that are most useful for continuous, categorical and/or complex data sets where there may be more predictors than observations.

 

See how to:

  • Understand basic regression
  • Use Standard Least Squares
    • Understand that basic assumptions are that response is normal and there is no multicollinearity between the factors
    • Handle  continuous response with continuous or categorical factors
    • Interpret line of best fit that  minimizes error between the points and regression 'curve'
    • Handle main effects, interactions, etc.
    • Apply Box-Cox transformation if needed
  • Use Stepwise Regression
    • Understand that Stepwise Regression is like Standard Least Squares that adds terms  in 'steps' to optimize all factors
    • Specify  'Stopping Rule' and 'Direction' controls the process
    • Make the model using a button click
  •  Use Logistic: Ordinal & Nominal Regression to provides a probability of a particular outcomen
    • Understand that these techniques are similar to Standard Least Squares, but use ordinal or nominal response (categorical)
  • Understand that sometimes standard regression is not enough
    • JMP Pro Generalized Regression can be useful for non-normality and highly correlated data
    • Partial Least Squares (PLS) is useful when factors exceed observations
    • Model Screening can be used to find the best model the quickest

Reminder: In JMP, a p-value between .01 and .05 is colored red; less than .01 is orange; greater than .05 is black.  Color-coding can help when you are analyzing a number of different effects at once and want to look in reports for visual cues of  statistical significance. In simple terms, a low p-value suggests that the null hypothesis is wrong, and that there really is an effect out there in the population.

 

Q: What is VIF?

A: The variance inflation factor (VIF) for each term in  High VIF values indicate a collinearity issue among the terms in the model.  Can we disentangle the effects relative to the response.

 

Q: Can we extrapolate in Prediction Profiler?

A: Yes. If you go beyond in the Prediction Profiler you will be able to turn on Extrapolation Control to get a warning if/where extrapolation is not sensible.  It will flash a red textual warning if there is and issue.  Be careful extrapolating in a model, because you may have not data to back up the results.

 

Extrapolation Control.JPG

 

Q: When do you use "cross vs nest vs macros" in model specification?

A: these are different ways to construct a model, some appropriate for different use cases. Cross Creates interaction and polynomial effects by crossing two or more variables. Nest Creates nested effects. Macros Generates effects for commonly used models.  See documentation for more details.

 

Q: What should we do if the distribution is still not normal after the Box-Cox transformation?

A: Look at the data to see if there is some domain knowledge that you think might be influencing results, is there a pattern, enough data. Or, use a more robust tool - We use Generalized Regression in JMP Pro. 

 

Q: Can I consider the Stepwise approach as a kind of factor-screening?

A: It adds factors at a time, but because we added lots of interactions.  Factor screening typically looks only at main effects.

 

Q: What to do if R square is very low?

A: It depends. R-square is of relative value. It depends on what your threshold is for low. Low typically indicates there is something related to your response that is an issue.  You may add more terms to your model.

 

Q: How does GenReg differ from Generalized Linear Model? When do we use which?

A: To compare, see documentation on Generalized Linear Model and Generalized Regression Models.

 

Q: Why are you getting a ChiSq response vs a p-value?

A: See documentation for details.  Remember if you have a question about a report, use the Question Mark (?) icon to get and answer. 

After the session, JMP Technical Support Engineer @PatrickGiuliano Patrick Giuliano wrote: The Prob > ChiSquare is itself, a p-value, but in that case Peter demonstrated, the one associated with the Chi-Squared distribution (vs the sampling distribution of the t-statistic, where JMP would indicate a p-value for that case as "Prob > abs(t)") -   I think ChiSquared distributions are better estimators for "ill-conditioned" data like data where we have censoring (which isn't going to be normally distributed).

 

Q: I’m using JMP 17 (not the Pro version). How should I deal with a distribution that is a mixture of several distributions and none of Generalized Linear Regression distributions (Normal, Binomial, Poisson or Exponential) suits?

A: Spend some time digging into the data to see if there is something going on defining why I have a mixture.  Maybe there is a lot of data, maybe there is a categorical factor that each have a different distribution, or if not much data, it may be because of that.  Try to ID what is going on, possibly get more data, or use the Simulator in the Profiler (access it from the red triangle) to ID what might be going on.

 

Q: Can you talk about fitting logistic models?

A: Logistic models have non-continuous (ordinal or nominal) responses and we are predicting likelihood of a something having the response value.  The technique is similar to Standard Least Squares but we are not getting a continuous response.

 

Q: How do I know the Best model in Stepwise Regression.

A: You will see it in the report. Check the circle to the left of it and JMP will run it. 

 

Stepwise Best.JPG

 

Questions from a previous session answered by Peter Polito @Peter_Polito  and Clovis Weisbart @cweisbart :

 

Q: How do you get to the Fit Stepwise platform?

A: There is a drop-down option at top right corner when you are in the Fit Model platform. You can change it from Standard Least Squares to Stepwise.

 

Q: By saving the Script, does this mean that, for a new set of data, you could delete the existing one, replace it with the new one,  save it with under a new name, and run it to get new results?

A: Yes, and the new data would need to have same variable names for the script to work. 

 

Q: What is more important in your linear regression, normality of your raw data for the output or normality of the residuals?

A: Both are important. If you have non-normal responses (output) it is a gradation, so if it is slightly abnormal, then you're probably okay. If it's very skewed, then  it's not okay. And if you have non-normality in your residuals, that's not okay. That means that you have some sort of bias or some sort of factor that is at play in your model that is unaddressed.

 

 

Q: With the example where a gamma distribution was the best, would you have opted for a transformation or use of generalized linear modeling when you're modeling, or some approach this is better?

A: When I have non-normal data, I use Generalized Regression, which is a JMP Pro feature. I would go to Analyze>Fit Model and then drag in my 12 month cumulative into my response. From here I choose Generalized Regression, which gives me an option to apply the distribution of that particular response, and so I could actually select a gamma distribution and it will be accounted for in the the model construction. I do this any time my data is not normal. Because I have JMP Pro, I usually just go to this anyway, because when JMP Pro runs a generalized regression, it always gives you gives you a Stardard Least Squares output in addition to the Generalized Regression output.

 

 

A: JMP recursive partitioning is CART not CHAID. In our classification and regression trees, we have binary splits at each point, and CHAID allows for multiple splits of a variable at each step. See documention on the statistical details for Partition.

 

Q: Is there a situation in which you would keep factors with p-values greater than .05  in your model?

A: Yes, there are a least two instances and may be others on a case-by-case basis. Sometimes the main effect is insignificant but it is part of a significant interaction. Or, you may have a factor that you know is integral to the final product. Here is an example:

 

 

 

Questions answered at previous sessions on this topic:

 

Q: If two predictors/factors are strongly correlated with one another, and both strongly affect the Y in the model, will JMP just choose the one you listed first in the Fit Model platform as having a low p-value, then giving the covarying one a very high p-value since it effectively doesn't "add" anything given said covariance?             

A: Multicollinearity is always a good question to investigate. Variance Inflation Factor (VIF) is one potential option. Within the Parameter Estimates table in the report window you can find an option to assess the VIF by right-clicking and selecting VIF under Columns.

 

Q: If you have too many factors, where you could not do Backward Stepwise, can you still do Forward Stepwise?

A: You should be able to specify a forward selection model under the "Direction" option. With many, many potential predictors you might employ the Predictor Screening platform, which is an incredibly powerful platform for variable selection and identifying the top factors. From there you might choose a smaller subset of factors to focus on. Predictor Screening is available in all versions of JMP, not just JMP Pro.      

 

Q: Does Stepwise accommodate multicollinearity since it will simply be unlikely to choose redundant model terms?

A: That might be too advanced for this session and could be construed as "don't worry about it; stepwise will take care of you if you are too lazy to pull collinear terms on your own.”   

 

Resources



Start:
Mon, Mar 25, 2024 02:00 PM EDT
End:
Mon, Mar 25, 2024 03:00 PM EDT
Attachments
0 Kudos
0 Comments