cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
samalar
Level I

Prediction formula null values

Hello,

 

After saving my prediction formula from GenReg, I'm seeing about 80 percent of my dataset has null values (.)  in the predicted values column. I suspect it has something to do with several "levels removed" in some of my significant predictors. Any advice on how to handle this or how to pivot from this in order to arrive at a higher yield (of predicted values) would be appreciated. Thank you :)

4 REPLIES 4
samalar
Level I

Re: Prediction formula null values

Bump to see if anyone can respond. Thanks!
uday_guntupalli
Level VIII

Re: Prediction formula null values

@samalar,
   Can you share a reproducible example - so people can try and step through your workflow ? 

   You don't have to share any confidential data - you can either anonymize your data or use the sample data sets in JMP if possible. 

Best
Uday
samalar
Level I

Re: Prediction formula null values

Thanks for taking a look. Genreg is modeling 6 predictors to estimate Total_time. Under Effect tests (see screenshot below), var1 is highly significant but 4 levels removed; count1 is fine because it is a continuous variable. Var2, Var3, Var4 have several levels removed. The screenshot of Prediction formula for Total_time shows that about 90% of calculated value is null. I understand that I will have to reconfigure the categorical variables. Can you help explain how to handle "levels removed"? Is this why the model applies to only 10% of the data?

 

Total_time1.pngTotal_time2.png

Re: Prediction formula null values

A categorical factor with k levels will require k-1 parameter estimates. You have many levels for each of your categorial variables. That translates into a model with many parameters to estimate. You don't have enough data to estimate your model. There are only 98 observations in the training set and 43 in the validation set. There is no way to validate your model since you don't observations with each level of all of those categories. JMP does its best to provide a fit to the data, but you need more data. You should rethink what model you wish to fit and the format of your data.

Dan Obermiller