cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
  • New JMP features coming to desktops everywhere this September. Sign up to learn more at jmp.com/launch.
Choose Language Hide Translation Bar
Florent_M
Level IV

How to model responses when their sum is equal to 1?

Hello!

I have completed a DoE in order to model the particle size distribution. The distribution is measured by several screens. For each size range (e.g. <1mm, 1-5 mm, >5 mm), I have the proportion of particles, and the sum must be one.

 

My question is how to model the three responses properly, taking into account that the sum of the responses must equal 1.

 

I have included a synthetic table to illustrate the problem.

 

I would be very grateful for any advice you could give me.

 

Thanks,

Florent

3 REPLIES 3
Victor_G
Super User

Re: How to model responses when their sum is equal to 1?

Hi @Florent_M !

 

If you have access to raw data (particle size distribution) and a JMP Pro licence, you could directly use the Functional Data Explorer instead of binning your data ?

There could be other workarounds possible using Fit Curve or Fit Y by X platforms, by extracting curves parameters and modeling the influence of factors on these curve parameters, and "re-building" curves based on the curve parameters predictions.

 

If you want to keep the data like this, you could maybe use Partial Least Squares Models to leverage the correlations between responses (brought by the constraint between the responses):

Victor_G_0-1743692184390.png

I added a script in your table with the PLS option.

  

Hope this first idea may help,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Florent_M
Level IV

Re: How to model responses when their sum is equal to 1?

Dear Victor,
Thank you for your help! 
I do not have access to the full distribution due to the method of measurement (successive screens).
The PLS is a good idea and makes the sum of ys equal to 1, but unfortunatly, does not force the ys to be >0.
I finally found a trick inspired from neural networks by using the softmax function.

 

The first step is to invert the softmax function, where y_i are the measures, C is a constant that can be set to 0, beta is a tuning parameter that can be set to 1:

 

 

Then, fit model to s_i:

 

Finally, transform the model prediction with softmax

 

 

This worked quite well for the example attached. This forces the model predictions to be positive and their sum to be 1..

statman
Super User

Re: How to model responses when their sum is equal to 1?

Florent,  Here are my thoughts.  I agree with Victor, your best bet is to get the actual distributions for each treatment.  From those you can model the median (this is usually the case as particle distributions are typically not normally distributed), and some measure of dispersion (e.g., standard deviation, fairly robust to distributional issues, see Shewhart) as separate response variables.  Make sure you plot the distributions for each treatment.  Sometimes you can "create" a response variable that better describes each distribution and therefore what factors are influencing the particle size.

 

Interestingly, the metric you are trying to use is not really 3 independent response variables.  As you note, there is a distribution of particle size for each treatment.  Your categorization into 3 categories to describe the distributions is creative, and calculating a proportion for each category also creative, but unfortunately categorical Y's are not very effective as they often lack discrimination (especially when you only have 3 categories).

 

On the other hand, It might be more interesting to see how the proportions change associated with the treatments.  I would go ahead and analyze each category as a separate Y.  You would be trying to answer the questions: Do any of the model effects impact the proportion of particle sizes that are <1mm? or 1-5mm? or >5mm?

 

Should those response proportions correlate? If they don't why?

 

I did notice a strange value in row 20 for factor X1 (0.31)?

 

I ran Fit Model (Least Square Fit) and Multivariate analysis and added the scripts to your table.

"All models are wrong, some are useful" G.E.P. Box

Recommended Articles