Discussions

NullHorse120 · Feb 18, 2026 02:02 AM

Hi there,

I have a data file containing volatile concentrations measured for 34 different sample formulations. Now, I'm trying to model the volatiles found (response) to the formulations (predictors). However, as not all formulations lead to the same produced volatiles, I have quite some 0 values (not-detected) in my data. This causes quite a lot of my volatile data to follow a SASH or Johnson distribution. Removing the samples with a 0 value for certain volatiles would lead to a better distribution of my data for modeling, but of course this is also manipulation of that data in my advantage. Another disadvantage is that my sample pool would get smaller. For example if a certain ingredient that is responsible for one certain volatile is only in half of my samples (meaning the volatile would not be formed in the other half and thus be 0), the sample set would be halved.

In a way I think it makes sense to remove the 0 values, as I only need the ingredients responsible for forming the desired volatile(s) in the model in the first place. But, I'm quite uncertain if this method would be valid if I were to publish this in a peer review journal. Does anybody have some experience with this and whether removing the samples with 0 values would be accepted or not?

Thanks a lot in advance!

christian-z · Feb 18, 2026 07:01 AM

In JMP Pro, you can see Generalized Regression Examples to fit models to non-normal distributions like Poisson, binomial, etc.

NullHorse120 · Feb 23, 2026 02:41 AM

Thank you for your reply. I see that these distributions can only be used on integer data, but my data is non-integer. Is there another solution?

christian-z · Feb 24, 2026 05:40 AM

Good point, sorry to have missed that! Although I agree that your approach makes sense from a practical perspective, since your question is about suitability for publication, I would defer to others with more statistics knowledge. I am curious to see what others think.

Another idea that comes to mind: have you tried Neural Networks or Partition Models (Decision Trees)? The Neural model provides maximum flexibility, while the Decision Tree is a nice balance of flexibility and interpretability.

Discussions

Removing 0 values to improve model fitting for academic purposes

Re: Removing 0 values to improve model fitting for academic purposes

Re: Removing 0 values to improve model fitting for academic purposes

Re: Removing 0 values to improve model fitting for academic purposes

Recommended Articles