cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
NullHorse120
Level I

Removing 0 values to improve model fitting for academic purposes

Hi there,

I have a data file containing volatile concentrations measured for 34 different sample formulations. Now, I'm trying to model the volatiles found (response) to the formulations (predictors). However, as not all formulations lead to the same produced volatiles, I have quite some 0 values (not-detected) in my data. This causes quite a lot of my volatile data to follow a SASH or Johnson distribution. Removing the samples with a 0 value for certain volatiles would lead to a better distribution of my data for modeling, but of course this is also manipulation of that data in my advantage. Another disadvantage is that my sample pool would get smaller. For example if a certain ingredient that is responsible for one certain volatile is only in half of my samples (meaning the volatile would not be formed in the other half and thus be 0), the sample set would be halved. 

In a way I think it makes sense to remove the 0 values, as I only need the ingredients responsible for forming the desired volatile(s) in the model in the first place. But, I'm quite uncertain if this method would be valid if I were to publish this in a peer review journal. Does anybody have some experience with this and whether removing the samples with 0 values would be accepted or not?

Thanks a lot in advance!

3 REPLIES 3

Re: Removing 0 values to improve model fitting for academic purposes

In JMP Pro, you can see Generalized Regression Examples to fit models to non-normal distributions like Poisson, binomial, etc. 

NullHorse120
Level I

Re: Removing 0 values to improve model fitting for academic purposes

Thank you for your reply. I see that these distributions can only be used on integer data, but my data is non-integer. Is there another solution?

Re: Removing 0 values to improve model fitting for academic purposes

Good point, sorry to have missed that! Although I agree that your approach makes sense from a practical perspective, since your question is about suitability for publication, I would defer to others with more statistics knowledge. I am curious to see what others think. 

Another idea that comes to mind: have you tried Neural Networks or Partition Models (Decision Trees)? The Neural model provides maximum flexibility, while the Decision Tree is a nice balance of flexibility and interpretability. 

Recommended Articles