Subscribe Bookmark RSS Feed

How can I simulate trial execution model for virtual DoE?

rabotareferat0

Community Trekker

Joined:

Sep 4, 2016

Could you please help me with this problem?
I want to create a computer emulator for a learning purpose. It will emulate a binary outcome and use several Yes/No factors as an input (lets say A B C).
The output will be the % of successes. 
 
The output when all factors are at "-" is 5% of successes. 
When 
A at "+"   5.5%  (0.5% gain)
B at "+"   6.5%  (1.5% gain)
C at "+"   5%  (0% gain)
A and B at "+" (AB interaction)   7.5% (2.5% gain)
 
So I want to implement it in a software as a trial execution model with the list of random trials and outputs(depending on levels of factors A-B-C).
 
Example of software behavior:
 
Order of trial    ABC     Output
1st trial           + + +    Fail
2nd trial          + - +     Fail
3rd trial           + -  -     Success
.....
10000 trial       + -  -     Fail
 
 
I want it to look realistic so that not all the time the combination of A+ B- C- will result to "Success" because the process is not physical, it is a human response.

 

What is the general steps of constructing such an algorithm?
Can it be done in JMP?

In addition I am considering such tools as Excel, R, simulation software like Anylogic, Promodel is needed.

 

Big thanks!

5 REPLIES
Dan_Obermiller

Joined:

Apr 3, 2013

This could probably be done more elegantly, but a quick "brute force approach" would be to enter the random formulas into a JMP data table. I made the A, B, and C columns a random integer column between 0 and 1 (I chose 0 instead of -1 for ease). Then for the Result column, I just entered a string of "If" conditions with the various probabilities from a binomial distribution with the appropriate probability of success. Fairly simple in this case since there are only 8 possible combinations of A, B, and C. For a larger problem I would try to be more elegant. The number of rows in your table will be the number of random trials you wish to have. For this simple case I just had 10 rows.

 

 

New Table( "Untitled",
  Add Rows( 10 ),
  New Column( "A", Numeric, "Nominal", Format( "Best", 12 ),
              Formula( Random Integer( 0, 1 ) ) ),
  New Column( "B", Numeric, "Nominal", Format( "Best", 12 ),
              Formula( Random Integer( 0, 1 ) ) ),
  New Column( "C", Numeric, "Nominal", Format( "Best", 12 ),
              Formula( Random Integer( 0, 1 ) ) ),
  New Column( "Success?", Numeric, "Nominal", Format( "Best", 12 ),
              Formula(
             If(
             :A == 0 & :B == 0 & :C == 0, Random Binomial( 1, 0.05 ),
             :A == 0 & :B == 0 & :C == 1, Random Binomial( 1, 0.05 ),
             :A == 0 & :B == 1 & :C == 0, Random Binomial( 1, 0.065 ),
             :A == 0 & :B == 1 & :C == 1, Random Binomial( 1, 0.065 ),
             :A == 1 & :B == 0 & :C == 0, Random Binomial( 1, 0.055 ),
             :A == 1 & :B == 0 & :C == 1, Random Binomial( 1, 0.055 ),
             :A == 1 & :B == 1 & :C == 0, Random Binomial( 1, 0.075 ),
             :A == 1 & :B == 1 & :C == 1, Random Binomial( 1, 0.075 ),
             Random Binomial( 1, 0.05 ) ) ) ) )

 

 

Dan Obermiller
rabotareferat0

Community Trekker

Joined:

Sep 4, 2016

Thank you very very much for this script! That is what I was seaching for.

Could you tell me please the reason why when I generated 10000 rows the Nominal Logistic regression doesnt show that A and AB are significant effects? 

 

2017-11-08_22-32-01.png

 

 

 

 

Dan_Obermiller

Joined:

Apr 3, 2013

Of course every simulation is different, but you are looking for very small effect sizes. With 10,000 trials, you will have approximately 5000 trials with A at 1, and 5000 trials with A at 0. Of those 5000 at 0, you expect a 5% "success" rate which is about 250 observations. With 5000 at 1, you expect 5.5% success rate, which is about 275 observations. Only a 25 observation difference out of the 10,000 total. That's a small difference. Add the random noise to this and you can see that the difference might actually be even smaller.

 

Categorical data typically requires larger sample sizes. That coupled with looking for very small effects leads to non-significance. Stated in more statistical terms, you have very low power to detect your desired effect sizes.

Dan Obermiller
rabotareferat0

Community Trekker

Joined:

Sep 4, 2016

Many thanks for clarification!

I've done calculation of minimal sample size so it could detect the difference of 0,5 %. However, sometimes the analysis of the simulated results doesn't detect AB interaction as significant with 64000 trials.

 

2017-11-09_15-46-44.png

 

 

Have I done the calculations right?

 

2017-11-09_15-42-42.png

Dan_Obermiller

Joined:

Apr 3, 2013

You are fitting models that are more complex than this sample size calculator was intended to do. There are extra degrees of freedom being used to estimate other terms.

 

I would suggest you go back to the formulas for each of the 8 different scenarios (combinations of A, B, and C) to make sure that you are using the proper probabilities. You had only provided four of the eight situations so I made some assumptions about the other four cases. You may wish to check those to make sure everything is as you expected it to be. I am guessing you are expecting the interaction effect to be a 2.5% gain, but the interaction term is NOT a 2.5% gain. You only get a 2.5% gain when both A and B are positive as you indicated. The interaction also includes what happens for the other 3 combinations of A and B.

 

As an example: A and B both positive: probability is 7.5%

                         A and B both zero: probability is 5%.  So when they are "the same sign" gives an average probability of 6.25%

 

     A zero, B positive: probability is 6.5%

     A positive, B zero: probability is 5.5% for an average probability of 6%. 

 

 

Therefore, the interaction effect is actually 0.25% which is really small (and pretty close to your A*B interaction estimate in your analysis).

 

Dan Obermiller