Video using JMP 18 was posted in May 2025.
Do you want to use predictive models and need to understand or review the basics first? Have you tried predictive modeling and are a bit usure whether to accept the accuracy of models you develop? Are you relying on extrapolation because you can’t measure above or beyond certain limits?
In this session Mastering JMP session, we explore tools in JMP and JMP Pro to build predictive models and avoid common pitfalls. We will:
- Perform validation of your model to prevent overfitting.
- Avoid nonsensical predictions.
- Address multicollinearity of input variables.
Questions answered by @andreacoombs , Momena Monwar @momena , @Jed_Campbell, @charlie_whitman and @Bill_Worley at the live webinar:
Q: What does collinearity mean?
A: Collinearity shows a linear relationship between two explanatory variables. Two variables are perfectly collinear if there is an exact linear relationship between the two, so the correlation between them is equal to 1 or −1. When there is collinearity between more than 2 factors, that is typically called multicollinearity.
Q: Can you use Predictor Profiler for multicollinearity?
A: In Prediction Profiler, you can see, based on the change of the slope, if there is any collinearity or interaction effect.
Q: Can you look at interaction terms in the predictor screening?
A: Predictor Screening is a quick tool to look into column contribution based on Bootstrap Forest. You can’t see interaction terms directly from the Predictor Screening platform interface. However, you can run the Bootstrap Forest, turn on the Profiler, and then select the interaction to look at it.
Q: Will JMP add a Null Factor to the Predictor Screening platform as a way to even double check which variables should be included vs a truly random variable. I'm thinking of something like the Null Factor in the auto validation talk given some years ago.
A: You can create add a random column or two, then include those columns in the X category in the Predictor Screening platform. Then anything below that random variable, you probably wouldn't want to include.
Q: How many predictors are too many? Should we use the VIF value as a guide, or can some models contain 50+ predictors?
A: Typically when you get over 5-10, you run into collinearity or multicollinearity. This video and materials cover VIF and collinearity
Q: Why don't you want to include variables that are correlated/collinear? Why do you delete them?
A: If you were using a Neural Net or Bootstrap for example, you can leave in both. But if you're building in a linear model, and those parameter estimates are correlated, then no, you do not want them both included. You can choose to keep one in and remove the other.
Q: Can you only do model validation with a large amount of data? What is the minimal data you should have to be able to hold back some of the data?
A: If there is too little data, then there is a chance of overfitting. Also, it depends on the type of model you are trying to fit. You can try with 50/100 data and the more data, the better.
Q: When you filtered the model on either training or holdback rows, was the model retraining on the holdback rows instead of applying the original one?
A: The training set was used to train the model. The holdback portion is not treated to the model at first and it is considered as unseen data to the model. After the training model is done, it uses the holdback portion to see how well the model predicts for unseen data.
Q: Should we use box-cox transformation to remove lack of fit?
A: You can. It helps linearize relationships between predictors and response.
Q: What's the lambda limit to determine that you need to refit while doing the Box Cox transformation?
A: Essentially you are looking at where the red line comes down and where it where it crosses the blue horizontal line. So, anything in that area you probably don't need to refit, but if it's above that horizontal line, you probably would want to refit.
Q: Is there a way to put "limits" on trees, for example, depth?
A: In standard JMP, when you launch Predictor Screening you can specify number of trees. In JMP Pro, there are more ways to tune the model. The Predictor Selection Assistant Add-In will also help with drawing more of a "line" in Predictor Screening. It was Developed by Scott Allen, another one of JMP's Systems Engineers.
Q: When you say Weibull and normal looks good, how are you determining that?
A: In my example, I did not include a validation column. If you don’t have a validation column, you can use an information criteria to determine which is the best model. You compare the AICc values for the different models (for example, the AICc for Weibull vs AICc for Normal). The magnitudes of these units are not important. What's important is comparing them to each other. So, anything within a few units you would consider to be the same. Weibull and Log Normal are pretty close here, but they are both substantially smaller(lower AICCs) than exponential and normal. Another approach would have been to include the validation column, and then I would have gotten another column here that would say, the Validation R-squared or the Test R-squared. So, looking at the R-squared of that true holdout and comparing models for the test is another ways to determine best model. See video:
This video shows how to build predictive models for correlated and high dimensional data using JMP Pro.
Q: What should one do if multicollinear responses are significant?
A: Don't trust that, because your p-values and your model are not reliable when there is collinearity or multicollinearity. You have to handle multicollinearity before you can do any tests of significance for either parameter estimates or for the model, or to make predictions.