BookmarkSubscribeRSS Feed

Prediction formula null values

Highlighted
samalar

New Contributor

Joined:

Oct 2, 2017

Hello,

 

After saving my prediction formula from GenReg, I'm seeing about 80 percent of my dataset has null values (.)  in the predicted values column. I suspect it has something to do with several "levels removed" in some of my significant predictors. Any advice on how to handle this or how to pivot from this in order to arrive at a higher yield (of predicted values) would be appreciated. Thank you :)

4 REPLIES
samalar

New Contributor

Joined:

Oct 2, 2017

Bump to see if anyone can respond. Thanks!
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

@samalar,
   Can you share a reproducible example - so people can try and step through your workflow ? 

   You don't have to share any confidential data - you can either anonymize your data or use the sample data sets in JMP if possible. 

Best
Uday
samalar

New Contributor

Joined:

Oct 2, 2017

Thanks for taking a look. Genreg is modeling 6 predictors to estimate Total_time. Under Effect tests (see screenshot below), var1 is highly significant but 4 levels removed; count1 is fine because it is a continuous variable. Var2, Var3, Var4 have several levels removed. The screenshot of Prediction formula for Total_time shows that about 90% of calculated value is null. I understand that I will have to reconfigure the categorical variables. Can you help explain how to handle "levels removed"? Is this why the model applies to only 10% of the data?

 

Total_time1.pngTotal_time2.png

Dan_Obermiller

Joined:

Apr 3, 2013

A categorical factor with k levels will require k-1 parameter estimates. You have many levels for each of your categorial variables. That translates into a model with many parameters to estimate. You don't have enough data to estimate your model. There are only 98 observations in the training set and 43 in the validation set. There is no way to validate your model since you don't observations with each level of all of those categories. JMP does its best to provide a fit to the data, but you need more data. You should rethink what model you wish to fit and the format of your data.

Dan Obermiller