Hi there,
I have a data file containing volatile concentrations measured for 34 different sample formulations. Now, I'm trying to model the volatiles found (response) to the formulations (predictors). However, as not all formulations lead to the same produced volatiles, I have quite some 0 values (not-detected) in my data. This causes quite a lot of my volatile data to follow a SASH or Johnson distribution. Removing the samples with a 0 value for certain volatiles would lead to a better distribution of my data for modeling, but of course this is also manipulation of that data in my advantage. Another disadvantage is that my sample pool would get smaller. For example if a certain ingredient that is responsible for one certain volatile is only in half of my samples (meaning the volatile would not be formed in the other half and thus be 0), the sample set would be halved.
In a way I think it makes sense to remove the 0 values, as I only need the ingredients responsible for forming the desired volatile(s) in the model in the first place. But, I'm quite uncertain if this method would be valid if I were to publish this in a peer review journal. Does anybody have some experience with this and whether removing the samples with 0 values would be accepted or not?
Thanks a lot in advance!