Discussions

Florent_M · Apr 3, 2025 10:30 AM

Hello!

I have completed a DoE in order to model the particle size distribution. The distribution is measured by several screens. For each size range (e.g. <1mm, 1-5 mm, >5 mm), I have the proportion of particles, and the sum must be one.

My question is how to model the three responses properly, taking into account that the sum of the responses must equal 1.

I have included a synthetic table to illustrate the problem.

I would be very grateful for any advice you could give me.

Thanks,

Florent

Victor_G · Apr 3, 2025 10:51 AM

Hi @Florent_M !

If you have access to raw data (particle size distribution) and a JMP Pro licence, you could directly use the Functional Data Explorer instead of binning your data ?

There could be other workarounds possible using Fit Curve or Fit Y by X platforms, by extracting curves parameters and modeling the influence of factors on these curve parameters, and "re-building" curves based on the curve parameters predictions.

If you want to keep the data like this, you could maybe use Partial Least Squares Models to leverage the correlations between responses (brought by the constraint between the responses):

I added a script in your table with the PLS option.

Hope this first idea may help,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Florent_M · Apr 8, 2025 6:15 AM

Dear Victor,
Thank you for your help!
I do not have access to the full distribution due to the method of measurement (successive screens).
The PLS is a good idea and makes the sum of ys equal to 1, but unfortunatly, does not force the ys to be >0.
I finally found a trick inspired from neural networks by using the softmax function.

The first step is to invert the softmax function, where y_i are the measures, C is a constant that can be set to 0, beta is a tuning parameter that can be set to 1:

Then, fit model to s_i:

Finally, transform the model prediction with softmax

This worked quite well for the example attached. This forces the model predictions to be positive and their sum to be 1..

statman · Apr 3, 2025 8:25 AM

Florent, Here are my thoughts. I agree with Victor, your best bet is to get the actual distributions for each treatment. From those you can model the median (this is usually the case as particle distributions are typically not normally distributed), and some measure of dispersion (e.g., standard deviation, fairly robust to distributional issues, see Shewhart) as separate response variables. Make sure you plot the distributions for each treatment. Sometimes you can "create" a response variable that better describes each distribution and therefore what factors are influencing the particle size.

Interestingly, the metric you are trying to use is not really 3 independent response variables. As you note, there is a distribution of particle size for each treatment. Your categorization into 3 categories to describe the distributions is creative, and calculating a proportion for each category also creative, but unfortunately categorical Y's are not very effective as they often lack discrimination (especially when you only have 3 categories).

On the other hand, It might be more interesting to see how the proportions change associated with the treatments. I would go ahead and analyze each category as a separate Y. You would be trying to answer the questions: Do any of the model effects impact the proportion of particle sizes that are <1mm? or 1-5mm? or >5mm?

Should those response proportions correlate? If they don't why?

I did notice a strange value in row 20 for factor X1 (0.31)?

I ran Fit Model (Least Square Fit) and Multivariate analysis and added the scripts to your table.

"All models are wrong, some are useful" G.E.P. Box

Discussions

How to model responses when their sum is equal to 1?

Re: How to model responses when their sum is equal to 1?

Re: How to model responses when their sum is equal to 1?

Re: How to model responses when their sum is equal to 1?

Recommended Articles