cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
bpvinjamuri
Level I

Is modeling ratios as continuous variables a right approach?

Dear Fellow JMP Users:

 

I greatly appreciate your thoughts/opinions on below listed case from an academic research.

 

Say a DOE has been executed to screen a pharmaceutical formulation composition (using materials- ‘A’, ‘B’, ‘C’, and ‘D’), with three factors (details below). Factor 1 includes the ratio of two materials—‘A’ and ‘B’—at two levels. Total number of runs were 12.

Factor 1 => Two levels (A:B) in weight by weight ratio; 3:1 (-1) and 5:1 (+1)

Factor 2 => Two levels (C) in weight percent; 0.5 (-1) and 1.0 (+1)

Factor 3 => Three levels (D) in weight percent; 0.5 (-1), 1.0 (0), 2.0 (+1)

Experiment #

Factor 1 (ratio)

Factor 2 (%)

Factor 3 (%)

1

1

1

1

2

1

-1

1

3

-1

-1

-1

4

1

-1

0

5

1

1

0

6

1

-1

-1

7

-1

-1

1

8

-1

-1

0

9

-1

1

0

10

-1

1

-1

11

-1

1

1

12

1

1

-1

*Note: Values in parentheses and table are coded values

 

My co-researcher want to consider ‘Factor 1’ as a continuous variable and fit the model with just CODED values (without using actual values). But, I am afraid that this is not a right approach for two reasons: (i) ratio is a continuous variable, and (ii) we cannot convert ratio to coded values and plug into the prediction equation (from parameters estimates) to compute an estimated value.

 

Questions to the community members:

  1. Is modeling main effects and two factor interactions with CODED VALUES a right approach by considering ‘Factor 1’ as a continuous variable?
  2. If your answer to question 1 is “YES”, then how do we plug in the ratio into model equation obtained from parameter estimates, such that one can get the predicted value shown by the software?
  3. If your answer to question 1 is “NO”, please advise how to approach this case?

I will be happy to provide more details as needed. Thanks in advance for your time and help.

3 REPLIES 3

Re: Is modeling ratios as continuous variables a right approach?

You said this was a formulation. Does that mean that A+B+C+D = 1? If so, using a ratio of two variables is a common approach that is used to avoid having to fit a Scheffe mixture model (not that that is difficult to do, but it is quite different from typical regression). I usually only suggest people do this if the ratio of A/B makes some physical sense. If it does not, I would fit a Scheffe mixture model instead.

 

Assuming it is a formulation as described above (and even if it is not, I guess!) using coded values for ALL of the factors is a good thing to do. In fact, if you created the design in JMP it would automatically turn on the coding property so that coded values are used for the parameter estimates. Even if JMP was not used for the design, you will see a version of coding used when you put interactions into the model. JMP will automatically center the factors (by subtracting the mean of the factor ranges) before multiplying the factors together. 

 

To specifically answer your questions:

1. Yes, this is a good approach. All of the factors should be coded.

2. I'm not sure I understand your question, but I will take a stab at it. In order to model the ratio, you will need to create a column that is the ratio formula. The calculated ratios could then be plugged into your prediction equation. Remember the ratio of A/B should have some physical meaning so this is not typically a problem to plug in a ratio value rather than individual A and B values.

 

If I do understand correctly, what can be done is to create your ratio formula column. Build your model and save the prediction formula to the data table. Now use the Prediction Profiler under the Graph menu to profile the prediction expression, but be sure to check the "expand intermediate formulas" checkbox. This will give you the profiler in terms of A, B, C, and D rather than Factor 1, Factor 2, and Factor 3. You can now use your profiler to generate predictions as well as being able to add rows to the data table to generate predictions. Be aware that this approach likely will allow nonsensical formulations if A+B+C+D = 1. There are ways around this, but it depends on how your design was created.

Dan Obermiller
bpvinjamuri
Level I

Re: Is modeling ratios as continuous variables a right approach?

Thanks for your detailed response, @Dan_Obermiller

 

This is not a mixture. Factor 3 ("D") gets evaporated during the process. We figured it is best to not go through that approach. Besides, I experienced challenges of poor predicability while using mixture designs. This is why we went with factorial design.

 

But, certainly, I will try creating a column of A/B ratio such that it makes some physical sense and back-calculate using modeled equation to see if I am able to get the predicted values.

SDF1
Super User

Re: Is modeling ratios as continuous variables a right approach?

Hi @bpvinjamuri,

 

  I might not be understanding your issue completely, but here are some thoughts:

  1. Yes to your Q1. Factor 1 (ratio) is a continuous variable. The ratio of A:B can vary from 10:1->9:1->8:1->...1:1->1:2->1:3...1:10, for example. JMP will always work with coded values behind the scenes, so even if you put in the actual value used to formulate the pharmaceutical, JMP will use -1, 1 or -1, 0, 1, depending on settings. You can't enter 3:1 or 5:1 for example in the values, you would have to enter 3 and 5, knowing that in the end the value of Factor 1 is a ratio of A:B.
  2. All of your factors are actually continuous -- in the sense that they could be continuously varied in principle, even though they might not be in practice. Do you need to have 3 levels for Factor 3? If not, just use the low high. Do you expect the response to be non-monotonic with Factor 3 -- i.e. could you have different response levels for the same Factor level? If so, then you would need more levels for Factor 3, but still treat it as continuous.
  3. What kind of DOE are you running? I'm guessing a Full Factorial. Is this the right kind of DOE that you're after? If you're trying to predict an outcome, then you'll probably want to use the Custom DOE platform and use an I-Optimal design rather than a Full Factorial. If you're just after how things interact and give a response, this should be fine. You can then narrow down some of the factors and do a more detailed I-optimal design if you want to end up making predictions. RSM DOEs are I-optimal by default, for example. These require more runs as it's trying to reduce the prediction variance ofer the design space.

 

Hope this helps!,

DS