cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Learn how to build custom Python data connectors and further customize JMP’s Data Connector Framework with the Python Data Connector Demo, available now in the JMP Marketplace!
  • See how to create experiments to support product design and ID useful product features. Register for June 12 webinar, 2pm US Eastern Time.

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
JunaidM
Level II

DoE for Conditional Continuous factors

I am trying to design an experiment where some factors are conditional, and I am not sure whether this can be handled properly using Custom Design or whether the design should be constructed manually.

I have one factor, X1, with a reference condition at 1, which is also the maximum value. X1 can only be varied below this reference condition, but one case of interest is also keeping it at the reference value.

For the other two factors, the structure is conditional:

  • X2 can be either OFF or ON

    • If OFF, there is no level/intensity
    • If ON, it has a continuous range
  • X3 can be either absent or present

    • If absent, there is no dosage/intensity
    • If present, it has a continuous range

So X2 and X3 are not simple continuous factors, because their continuous values only make sense when the corresponding factor is active/present.

My objective is to understand:

  • The effect of X2 and/or X3 being present versus not present
  • The effect of changing the level/intensity of X2 and/or X3 when they are active
  • How these effects behave at different values of X1
  • Whether there are interactions between X1, X2, and X3

One idea I am considering is to treat X2 and X3 as discrete numeric factors, where 0 represents OFF/absent and the non-zero values represent the active continuous range. For analysis, I would then avoid automatic coding/centering of polynomial terms so that the numeric levels are interpreted more directly.

However, I understand that this approach has drawbacks. In particular, the jump from 0 to the first non-zero level may combine two effects:

  • the effect of switching the factor ON/present
  • the effect of moving to the lowest active intensity/dosage

So it may not cleanly separate the activation/presence effect from the intensity/dosage effect. This could also make interaction terms harder to interpret, especially if X2 and X3 behave differently at different values of X1.

A second approach I am currently trying is to represent X2 and X3 using both a categorical activation factor and a discrete numeric level factor:

  • one categorical OFF/ON factor plus one discrete numeric level factor for X2
  • one categorical absent/present factor plus one discrete numeric level factor for X3


DOE(
    Custom Design,
    {
        Add Response( Maximize, "Y", ., ., . ),

        Add Factor( Continuous, -1, 1, "X1_Level", 0 ),

        Add Factor( Discrete Numeric, {0, 1, 2, 3}, "X2_Level", 0 ),
        Add Factor( Categorical, {"Off", "On"}, "X2_Status", 0 ),

        Add Factor( Discrete Numeric, {0, 1, 2, 3}, "X3_Level", 0 ),
        Add Factor( Categorical, {"Absent", "Present"}, "X3_Status", 0 ),

        Set Random Seed( 2055292721 ),
        Number of Starts( 4702 ),

        Add Term( {1, 0} ),
        Add Term( {1, 1} ),

        Add Term( {2, 1} ),
        Add Potential Term( {2, 2} ),
        Add Term( {3, 1} ),

        Add Term( {4, 1} ),
        Add Potential Term( {4, 2} ),
        Add Term( {5, 1} ),

        Add Term( {1, 1}, {2, 1} ),
        Add Term( {1, 1}, {4, 1} ),
        Add Term( {2, 1}, {4, 1} ),

        Set Sample Size( 12 ),

        Disallowed Combinations(
            ("X2_Status"n == "Off" & "X2_Level"n > 0) |
            ("X2_Status"n == "On" & "X2_Level"n == 0) |
            ("X3_Status"n == "Absent" & "X3_Level"n > 0) |
            ("X3_Status"n == "Present" & "X3_Level"n == 0)
        ),

        Simulate Responses( 0 ),
        Save X Matrix( 0 ),
        Make Design
    }
);
My concern is that the categorical status factor may be redundant, because the OFF/ON or absent/present status is already implied by the numeric level. I am therefore not sure whether this setup can truly separate the activation effect from the level/intensity effect, or whether it introduces collinearity/confounding that makes the model difficult to interpret.

My questions are:

  1. Is it statistically sensible to include both the status factor and the discrete numeric level factor?
  2. Can this setup meaningfully separate the activation effect from the level/intensity effect?
  3. Would it be better to use only the discrete numeric factors, with 0 = Off/Absent and 1–3 = active levels?
  4. Or is there a better way to handle this type of conditional factor structure in JMP Custom Design?

 

2 REPLIES 2
Victor_G
Super User

Re: DoE for Conditional Continuous factors

Hi @JunaidM,

The categorical factors are not necessary, since having X2 level at 0 already imply that this factor is "OFF" (same for X3).
So creating a design adding these factors will only create redundant information (and collinearity) during design generation (and Singularity Details during modeling due to the linear dependancy between status and level factors).
In the design generation script you shared, you can see that due to this redundancy (and dependancy between X2 and X2 status), the factors Xi_Status have been removed from the model by JMP in design evaluation platform, leaving only the discrete numeric and numeric factors terms:

Victor_G_0-1781107351713.png

So I would stick with the more direct design generation option.

What is your objective with this design ? Are you really interested into testing so many levels for X2 and X3 ? Could a screening/D-optimal design with 2 levels (one low: absence = 0 and one high for presence, for example 3) be sufficient for your needs ? Or 3 levels to analyze quadratic effects and avoid a simple absence/presence factor levels setting ?
If you really want to enforce these levels, you can still create your design with: 

  • X1 continuous factor (from -1 to 1)
  • X2 discrete numeric with values 0, 1, 2 and 3, or continuous with appropriate model terms to have sufficient number of levels. If you really want 4 levels to be tested, specifying the term X2 at the power of 5 will create 4 levels for this factor in the design. See force levels in DoE for more info.
  • X3 discrete numeric  with values 0, 1, 2 and 3, or continuous with appropriate model terms to have sufficient number of levels.

If you want to separate activation effect from level/intensity effect, you can still in the analysis evaluate what is the average response when X2 = 0 vs. average response when X2=1, 2 or 3, by averaging the response in these two conditions. It's far easier to summarize an information in the analysis if you already have a more granular and detailed response, as you'll be able to provide the two types of analysis and results : macro and detailed view.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
JunaidM
Level II

Re: DoE for Conditional Continuous factors

Hi @Victor_G 

Thank you for the detailed response. This is very helpful.

Just to clarify one point: in the script I shared, I had removed the X2_Status and X3_Status terms myself from the model terms because I was already concerned that they may be redundant. But your explanation confirms the concern more clearly: since X2 = 0 already implies OFF and X2 > 0 implies ON, adding a separate status factor would introduce redundant information and potential collinearity.

So, if I understand correctly, the more direct approach (Option 1) does make sense.

Based on your response, I am also thinking that three discrete numeric levels may be sufficient instead of four. For example, 0, lowest practical active level, and highest practical active level. This may be more appropriate because the distance between 0 → lowest active level and lowest → highest active level is not the same. 

Following your suggestion, this is the design structure I think makes more sense. I removed the separate categorical status factors and kept X2 and X3 as discrete numeric factors, where 0 represents the OFF/absent condition and the non-zero values represent active levels.

The model includes main effects, a quadratic effect for X1, and selected two-factor interaction terms:

 

DOE(
	Custom Design,
	{Add Response( Maximize, "Y", ., ., . ),
	Add Factor( Continuous, -1, 1, "X1", 0 ),
	Add Factor( Discrete Numeric, {0, 15, 30}, "X2", 0 ),
	Add Factor( Discrete Numeric, {0, 30, 45}, "X3", 0 ),
	Set Random Seed( 1139260218 ), Number of Starts( 71985 ), Add Term( {1, 0} ),
	Add Term( {1, 1} ), Add Term( {2, 1} ), Add Potential Term( {2, 2} ),
	Add Term( {3, 1} ), Add Potential Term( {3, 2} ), Add Term( {1, 2} ),
	Add Term( {1, 1}, {2, 1} ), Add Term( {1, 1}, {3, 1} ),
	Add Term( {2, 1}, {3, 1} ), Set Sample Size( 12 ), Simulate Responses( 0 ),
	Save X Matrix( 0 )}
);
One point I want to check is about model fitting and interpretation. Since 0 for X2 and X3 represents a real OFF/absent condition, and not just a coded low level, I assume I should be careful with automatic coding or centering of polynomial terms. My concern is that centering may make the model coefficients harder to interpret in relation to the actual OFF/absent state. Would you recommend fitting these terms using the actual numeric values, or is there a better coding strategy for this type of factor?

 

Recommended Articles