cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
hanyu119
Level III

Definitive Screening Design DOE for a large number of factors

Dear JMP experts,

I'm doing a dual-lens design DOE to couple lights into a fibre array. My goal is to look at the yield of the system when taking into account both the manufacturing tolerances and decenter and tilt of both lenses. I have 15 factors and 4 responses in the DOE. The 4 responses CE1, CE2, CE3 and CE4 correspond to the coupling efficiency of each fibre in the fibre array. 

I tried to use the Definitive Screening Design to do the initial screening due to its unique advantage compared to the traditional fractional factorial design. As can be seen from the simulation data, the difference between CE1 and CE4 should be negligible and the difference between CE2 and CE3 should be negligible. However, the DSD screening gives me different collection of statistically significant factors for all four responses.

Especially for the factor of l1_tilt_y, I knew for a fact that it's a dominant factor for the response of CE1 and CE4 and it has a nonlinear relationship for both of the responses, but the DSD screening cannot even pick that up for CE4. 

The JMP report file with data table embedded is attached.

Can you please kindly share with me your thought?

Thank you.

Best regards,

Eric

14 REPLIES 14

Re: Definitive Screening Design DOE for a large number of factors

The data table that you provided contains the design columns but no response data. The evaluation of the design is as expected. The evaluation of the model is not possible, though.

 

I am not familiar with this lens design, but I often encountered pre-conceived ideas about what was or was not known from previous experience. Many such situation were based on anecdotal stories that were not actually testable or scrutinized with a statistical approach. So I am not concerned yet that the outcome surprised you. Still, let's see what we can do to sort it out.

hanyu119
Level III

Re: Definitive Screening Design DOE for a large number of factors

Dear Mark,

Many thanks for the prompt reply.

Apologize for attaching the wrong JMP file.

I've reattached a JMP report file with the data table embedded.

Hope that's alright now.

Look forward to hearing your thought.

Cheers.

 

Re: Definitive Screening Design DOE for a large number of factors

You have some unusual observations for CE1 and CE4. The residuals are unusual. For example, here is the start of the plots for CE1:

resid.PNG

These observations can have unusually high influence on the estimates of the model parameters and, therefore, the hypothesis tests that the estimates is not zero.

hanyu119
Level III

Re: Definitive Screening Design DOE for a large number of factors

That's a great finding Mark! I never pay too much attention to the Main Effects Residual Plots, now I know how to use them.

I've validated my simulation data and all those outliers seem to be real, so I can't really remove them from my DOE.

What's the sensible next step that you recommend?

Is there a more reliable screening approach that you can share with me?

Thanks.

 

statman
Super User

Re: Definitive Screening Design DOE for a large number of factors

Much as Mark found, there are some unusual data points.  These may be due to noise rather than treatment effects. 

Definitive Screening Design - Multivariate.png

"All models are wrong, some are useful" G.E.P. Box
hanyu119
Level III

Re: Definitive Screening Design DOE for a large number of factors

The Multivariate is a great method to detect outliers. Thanks for sharing!

I've validated my data with the simulation and all the data is real.

Row 15 & Row 16: since l1_tilt y is the most dominant factor in the system, when l1_tilt_y is 0, which is the case for Row 15 & Row 16, the coupling efficiency of CE1 and CE4 becomes very high compared to the other runs. That's why JMP thinks they're outliers.

Row 37 is the centre point. Again, since l1_tilt_y is 0, CE1 and CE4 are high. 

Row 14 is a special case. I've triple-checked my simulation and it sort of makes sense, so I can't remove this point either.

Based on this, does it mean DSD is not the right approach to use for this specific case?

What's the sensible next step that you recommend?

Thanks.

statman
Super User

Re: Definitive Screening Design DOE for a large number of factors

I'm confused by your statement: "I've validated my data with the simulation and all the data is real.".  How can you possibly validate the real data with simulation?  Is this experiment run completely by simulation?  

 

There are two distinctly different unusual components of your situation:

1. There may be unusual data points for each treatment as identified by the multivariate analysis (these are likely impacted by noise).  Your DSD does not have any strategy to handle noise (blocking, repeats, split-plots, etc.).  I suggest you spend some time understanding noise.

2. The model is not adequate (this is identified by the resulting issues with residuals as Mark points out).  There are times when the model doesn't do a very good job predicting the results.  Why?  This is where your investigation should head.

"All models are wrong, some are useful" G.E.P. Box
hanyu119
Level III

Re: Definitive Screening Design DOE for a large number of factors

This experiment runs completely by simulation. It's a conceptual design. I don't have an actual product to verify the simulation. Having said that, the simulation should be pretty accurate since it's a very straightforward design. 

Clearly, the DSD model is not adequate. What next?

Let's say you have 15 factors to start with for a screening DOE, you know the response is nonlinear with some of the factors. How will you approach this problem?

Thanks.

statman
Super User

Re: Definitive Screening Design DOE for a large number of factors

Unless running the simulation takes a significant amount of time, you might as well run full factorials with as many levels as you want.  Of course, the problem is noise...not sure how you simulate this?  I actually don't understand why you would run simulation experiments.  The algorithm is already known in the software. The order of the model is already known.  If variables (or their non-linear components) are not in the algorithm, they can't possibly affect the response of the algorithm.

In real world experiments, we build models in hierarchical order using the fundamental principles or sparsity, hierarchy and heredity.  Typically we would start with determining linear main effects and add hierarchy as the experiment space approaches what we deem as optimal.  This work is always sequential and iterative.  Don't spend a lot of time and money learning about unimportant factors at less than optimum levels.  Center point designs are efficient at testing the curvature inside the experiment space.

"All models are wrong, some are useful" G.E.P. Box