cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
ggmst
Level II

How do I set constraints for categorical variable

I am trying to create a design with about 10 categorical variables, each with two levels (0,1).  I would like my design to model the main effects and all binary interactions.  When I use the JMP defaults, I get a D-optimal design such that the average number of cetegorical variables at the "1" level is around 5.  Is there a way to constrain the design such that I would only allow at most 3 of the variables to be at the "1" level?  (I understand this may necessitate more runs, but I am okay with that.)

 

Thank you.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: How do I set constraints for categorical variable

When I run into this, I usually convert the categorical into a 0/1 discrete numeric factor, and then use disallowed combinations to ensure the sum doesn't exceed the desired number of 1's. For 10 inputs (main effects model, but just to give you an idea), something like this:

 

DOE(
	Custom Design,
	{Add Response( Maximize, "Y", ., ., . ), Add Factor( Discrete Numeric, {0, 1}, "X1", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X2", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X3", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X4", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X5", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X6", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X7", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X8", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X9", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X10", 0 ), Set Random Seed( 1550400750 ), Number of Starts( 3017 ),
	Add Term( {1, 0} ), Add Term( {1, 1} ), Add Term( {2, 1} ), Add Term( {3, 1} ), Add Term( {4, 1} ),
	Add Term( {5, 1} ), Add Term( {6, 1} ), Add Term( {7, 1} ), Add Term( {8, 1} ), Add Term( {9, 1} ),
	Add Term( {10, 1} ), Set Sample Size( 16 ), Disallowed Combinations(
		X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 > 3
	), Make Design}
);

 

View solution in original post

Re: How do I set constraints for categorical variable

Ryan's approach is exactly what I was going to suggest. However, if you REALLY want to leave your factors as categorical (and assuming they are named x1 - x10), you can specify a Disallowed Combinations script that looks like this:

 

(x1 == 2) + (x2 == 2) + (x3 == 2) + (x4 == 2) + (x5 == 2) + (x6 == 2) + (x7 == 2)
+(x8 == 2) + (x9 == 2) + (x10 == 2) > 3

 

Note that the factors are categorical, so each of the X# == 2 is actually saying if that factor is at the second level.

Dan Obermiller

View solution in original post

6 REPLIES 6

Re: How do I set constraints for categorical variable

When I run into this, I usually convert the categorical into a 0/1 discrete numeric factor, and then use disallowed combinations to ensure the sum doesn't exceed the desired number of 1's. For 10 inputs (main effects model, but just to give you an idea), something like this:

 

DOE(
	Custom Design,
	{Add Response( Maximize, "Y", ., ., . ), Add Factor( Discrete Numeric, {0, 1}, "X1", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X2", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X3", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X4", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X5", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X6", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X7", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X8", 0 ), Add Factor( Discrete Numeric, {0, 1}, "X9", 0 ),
	Add Factor( Discrete Numeric, {0, 1}, "X10", 0 ), Set Random Seed( 1550400750 ), Number of Starts( 3017 ),
	Add Term( {1, 0} ), Add Term( {1, 1} ), Add Term( {2, 1} ), Add Term( {3, 1} ), Add Term( {4, 1} ),
	Add Term( {5, 1} ), Add Term( {6, 1} ), Add Term( {7, 1} ), Add Term( {8, 1} ), Add Term( {9, 1} ),
	Add Term( {10, 1} ), Set Sample Size( 16 ), Disallowed Combinations(
		X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 > 3
	), Make Design}
);

 

Re: How do I set constraints for categorical variable

Ryan's approach is exactly what I was going to suggest. However, if you REALLY want to leave your factors as categorical (and assuming they are named x1 - x10), you can specify a Disallowed Combinations script that looks like this:

 

(x1 == 2) + (x2 == 2) + (x3 == 2) + (x4 == 2) + (x5 == 2) + (x6 == 2) + (x7 == 2)
+(x8 == 2) + (x9 == 2) + (x10 == 2) > 3

 

Note that the factors are categorical, so each of the X# == 2 is actually saying if that factor is at the second level.

Dan Obermiller
ggmst
Level II

Re: How do I set constraints for categorical variable

Thanks Ryan and Dan!
I have the script running and can make the designs. I added all the binary interactions between the 10 variables, but now another question has come up. When I constrain to at most 3 factors at the higher level, and allow only 60 runs (as in the script) the Design Diagnostics look really horrible (the D-efficiency is very low and the Average Variance of Prediction is very high). It seems that when I am making the unconstrained designs through the GUI, there is some background calculation for the Minimum and Default number of runs that gets populated based on the model terms. When I use the script, I can choose how many runs (Ryan chose 60), but is there a way to invoke that background calculation for Default number of runs?
(What I am really trying to do is make some graphs to understand the tradeoffs between number of two-level variables, max number of variables at higher-level per run, and total number of runs needed to achieve a given Average Variance of Prediction?)

Re: How do I set constraints for categorical variable

You don't need to use the script to create the design. The script is just a convenient way to setup all of the work.

You DO need the script like I had posted for the constraints. Given your desire to try different scenarios, I would recommend going with the GUI, but specify the linear constraints as a "Disallowed combinations script" and paste in the script I provided. This will allow you to try something, and if you don't like it, just click the back button, make a change and try again.

 

As for the number of runs, the minimum number of runs is the total number of model terms that you are trying to estimate. For 10 2-level factors you would have an intercept (1) + main effects (10) + two-way interactions (45). That gives a total of 56 terms, so that is the minimum number of runs.

 

The Default number of runs has some built-in criteria that will provide some degrees of freedom for error (usually 4) and strive for orthogonality (to a point). It is not a straight formula. For this same situation, the default number of runs is 60 (which is why Ryan generated that design -- and notice 4 degrees of freedom for error). The Default number of runs does not take into account any constraints (to my knowledge!). Therefore, if you create this same design with or without constraints the default remains at 60. The default does NOT guarantee a design that will meet your needs! As you noted, when you add constraints, finding a good design can be more challenging and will often require more runs. Since 60 gave horrible results, go with a higher number. There is no easy calculation that I know of that will tell you the number of runs required to get a certain level for the Average Variance of Prediction (but don't forget about the VARIANCE on the variance of prediction! In other words, don't forget about the Fraction of Design Space Plot).

 

Hope this helps and happy hunting for a good design!

Dan Obermiller
ggmst
Level II

Re: How do I set constraints for categorical variable

Dan, thank you again for a very helpful response.  I am happily hunting for a good design, but find myself stymied again...

I am trying to make a graph that plots the Design Diagnostics (of particular interest is Average Variance of Prediction) as a function of number of categorical factors at the high level.  In all cases, I have ten two-level factors, and I only allow a set number of runs, 100.   I am sort of expecting the design to degrade when I have "too many" or "too few" allowed at the high level.

 

Thus, I am running the DOE script over and over with a series different constraints on how many factors allowed at high level, i.e.:

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 2

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 3

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 4

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 5

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 6

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 7

Disallowed Combinations( X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 != 8

(It would be nice to have this in a loop, but I just copy-and-paste the script).

What I find is that only the ones with 2 or 8 factors at the high levels converge and deliver the design!  . For the others, I get an error message "optimal designer failed to converge".  Is there any way around this?

 

 

Re: How do I set constraints for categorical variable

When you say not equal, you are restricting the algorithm too much with only 100 runs. You may not be able to estimate all of the model terms. 

 

For example, when you restrict to only 3 factors being at their high setting, there are 120 possible combinations. Even if you choose all 120 combinations you cannot estimate a main effects model. Why? Because you have a built-in singularity. Specifically the intercept (a column of 1's in the design matrix) would be equal to a linear combination of all of the factors: 1/3*(x1+x2+x3+x4+x5+x6+x7+x8+x9+x10). Therefore, the analysis cannot be completed and no design could be found for such a model.

 

You may be better off by going back to saying "no more than" 3 or 4 or whatever level you wish.

Dan Obermiller