Discussions

yiyichu · Jun 8, 2023 2:02 PM

I would like to evaluate the influence of 8 factors on the response. There are 1 continuous factor, 7 other categorical factors, and 1 categorical response, and 2 of the categorical factors have 4 levels, one of the categorical factors has 8 levels. Can we use the classic screening design method to design experiments?

We want to know the most influential factors among the 8 factors, where we could analyze the screening experimental results? When we add more than 6 factors or the levels of factors exceed 3, the "Screening" tab disappeared where we cannot analyze the screening design results. What are the problems? Does that mean the classical screening design cannot be used in this case? Or we should analyze the screening design results separately?

Thanks.

yiyichu · Jan 28, 2021 02:54 PM

@statman Thank you for your advice.

If I chose to use Method 3: RCBD as you mentioned to replicate the treatments over 2 blocks (best and worst locations), it is like the following:

The default number of runs is 16, and then the "Color Map on Correlations" is as follows:

From the above color map, it seems the model is not the main effect model. In general terms, the color map for a good design shows a lot of white off the diagonal, indicating orthogonality or small correlations between distinct terms. Does this mean that this is not a good design? And if I increase the number of runs, it will show most of white off the diagonal, whether it means I need to increase the number of runs when I do experimental design?

And one more question here:

Our response is a binary value (0/1), 0 indicates non-presence of occupants, 1 indicates presence of occupants, for the response, I used "Maximum" as the Goal, as follows:

Is this proper, or should I use "None" as the Goal?

statman · Jan 28, 2021 06:26 PM

While both @P_Bartell and I are trying to give you some advice, you should realize there is no "right" design. Every design will potentially vary in in the resources required, the precision of detecting design factor effects, the resolution, the model that can be estimated, the number of factors that can be estimated, etc. What I recommend is to design multiple experiments and evaluate them against the set of criteria and your situation. I also suggest you predict all possible outcomes and what you will do in each outcome. You have 8 design factors and at least 1 noise variable. The purpose of this 1st experiment is to design a better experiment (move the design space and identify the factors that are most interesting). Usually a lower resolution will work to identify main effects, but you should rank order the model effects up to 2nd order to determine whether you need to bump the resolution. Also when you have JMP block the design, the software treats the block as a random effect by default. However, when you know what noise is making up the block, you can treat it as a fixed effect and quantify block-by-factor interactions. I'm a bit old school you might say. I want to know the aliasing and don't like to do partial aliasing as this is more difficult to interpret and determine what the subsequent experiment should be. But I am probably in the minority here. Many folks like the optimal designs for their efficiency. I also don't fill in any Y's as I don't want the software to "control" my analysis (in some cases, limit my analysis options).
Regarding your response variable, have you thought about other ways to quantify the response? The binary response requires large sample sizes to detect differences. Could you vary the size of the "object" or "person" and record what size object gets detected? Or could you vary the amount of "motion" it takes to get detected? If distance from sensor matters, instead of blocking on location in the room, could you determine what distance from the sensor the object is detected? In any case, I would recommend more thought on response variables.

"All models are wrong, some are useful" G.E.P. Box

yiyichu · Jan 29, 2021 09:17 AM

@statman Based on your idea here, the block design will bring partial aliasing since it will quantify block-by-factor interactions. And we don't want to do the partial aliasing for the first experiment, am I right? And you said, "you didn't fill any Ys", does it mean you didn't input "response"?

Our response variable is binary Presence/Non-presence, which means whether the occupancy sensor detects people or not in space. Actually, the other way we could interpret the response is to count the number of people in a testing room. And then we could decide the sensor Presence/Non-presence based on the results later. If the count number is 0, then "Non-presence"; if it is not 0, then "Presence". Other than that, I couldn't think of another way to interpret the response. Or maybe you have some good thoughts? Would this be helpful to improve the case? Unfortunately, we couldn't detect the size of objects/people by using the occupancy sensor. The motion has four categories, major -- people walking, minor -- people extending their arms, fine -- people typing, none -- sleeping. What do you mean by varying the amount of motion? And the location influences the performance of the sensor not only because of the distance but also the angle from the sensor to people. If we want something to replace location, maybe we should include both distance and angle variables. And then we define the extreme values for distance and angle at two levels. Would this be helpful?

statman · Jan 29, 2021 10:08 AM

To clarify, partial aliasing is when an effect is neither orthogonal nor fully aliased. There is nothing wrong with aliasing as long as you know what is aliased. In fact, aliasing is an efficiency strategy. Why separate all effects in the first study. In particular you don't need to separate higher order terms (factorial or polynomial) in your first experiment. You need to decide what effects you want to estimate, which you are willing to alias, and which will be restricted. If an effect is significant and you know what is aliased, the next iteration can certainly de-alias those effects. The problem with partial aliasing is that the effect is spread out over multiple effects and therefore de-aliasing is not practically possible.
Regarding response variables: I don't know what type of sensor you are using, but think of how it "senses". Is it active or passive? If it is passive, is it using IR to detect the motion? How does the sensor process that signal. Can that be measured/quantified? Do you have an engineer that understands the science? I would work on a better measurement.

In designing an experiment, you need to decide what resolution you need for the design factors and what resolution you need for the Noise (Resolution is typically not used in reference to noise...Very few people spend time on understanding the noise even though studying it is required for robust design). Here is a thought experiment: Let's say location where someone is in the room is noise. You want your devise to be robust to noise (where someone is in the room). You are experimenting on design factors (sensor sensitivity, angle, size, etc.) that may or may not affect the performance of the sensing device. 1 of those factors is significant, but it's effect (magnitude and/or direction) depends on noise (as estimated by the block-by-factor interaction you obtained from the RCBD). You want to know that early in design so you can either find setting for that factor or find another factor or have the factor be adjustable in the field so the noise by factor interaction effect is reduced. Otherwise, the performance of the devise will depend on noise. Which means you will have customer complaints at some point.

"All models are wrong, some are useful" G.E.P. Box

yiyichu · Feb 3, 2021 02:19 PM

@statman I understand what you were trying to explain why the location should be considered as noise. But I think I need to explore more on how to decide the resolution. Or do you know there are some resources/papers that could help?

For the response variable, we do use a PIR sensor, which is passive, using IR to detect motion. But we do also need to test some sensor systems, which is a combination of different types of sensors. In this case, the results would be a calculation based on different sensor outputs, it is not a single measurement... And only the final binary output will be provided to us when we do the test. Maybe this case is more complicated for experimental design...

statman · Feb 4, 2021 01:29 PM

There are many papers that discuss options for handling noise. Fisher (who invented blocking), Box, et. al. have numerous papers/books.

Box, George, Hunter, William, and Hunter, J. Stuart (2005) “Statistics for Experimenters: Design, Innovation and Discovery” Wiley & Sons (ISBN 0471718130) covers it well. This paper is excellent:

Sanders, D., Leitnaker M., and McLean R. (2002) “Randomized Complete Block Designs in Industrial Studies” Quality Engineering, Vol. 14, Issue 1

One of my favorites:

Daniel, Cuthbert (1976) “Applications of Statistics to Industrial Experiments” Wiley (ISBN 0-471-19469-7)

Ultimate paper on split-plot designs (IMHO):

Box, G.E.P., Stephen Jones (1992), “Split-plot designs for robust product experimentation”, Journal of Applied Statistics, Vol. 19, No. 1

I have also attached an overly simplified discussion that I use to support lectures on the subject.

"All models are wrong, some are useful" G.E.P. Box

Mark_Bailey · Jan 29, 2021 08:22 AM

Sorry to join this discussion so late.

The Location seems to be a categorical factor with fixed effects. Is it really a restriction on availability? That is, you can only make a limited number of observations per location? How many locations are at your disposal? The fact that you called the two locations "best" and "worst" suggests a qualitative or categorical factor to me.

The model by default is only the main or additive effects. The color map shows you higher order terms, too, so that you can assess the impact of these effects (correlation) on the main effects. The effects will bias your model if you do not include them and increase the standard errors of the estimates if they are included.

What is the response? Is it binary or is it a count of two binary outcomes? For example, I might observe 20 instances in each run and record the number of each outcome (succeed, fail). On the other hand, I might only observe 1 instance in each run. You cannot use the power analysis in custom design in the case of such a response. But you certainly need more runs for such a response. The Simulate Response command in Custom Design along with the Simulate feature in JMP Pro can be used to assess the empirical power of any design.

The goal for the response is saved as the Response Limits column property. It defines the desirability function for optimization in the prediction profiler. I do not think it is appropriate for this case, but you can change it after you get to the profiler, and then save the updated definition.

yiyichu · Jan 29, 2021 09:34 AM

@Mark_Bailey Thank you for providing your thoughts.

For the location, the background is we set up the occupancy sensors, people will stand in different locations in a testing space, and then we evaluate whether the occupancy sensor will detect occupants or not. Because we want to know whether people's location would impact the performance of the occupancy sensors. Initially, we chose 8 random locations as 8 levels for the location variable. But base on our discussion with @statman, it is better not to use 8 levels for the first experimental design, that's why we decided to choose the extreme cases for the location, one is the worst case where the sensor couldn't detect occupants, one is the best case where people just stand right before the sensor so the sensor will detect occupants perfectly. So maybe it is not a restriction on availability? It is just a variable we want to vary to see how it impacts the response?

For the response, it is a binary value that we observe 1 instance in each run. For our case, the response is the Presence/Non-presence of people. Presence means the sensor detects occupants, Non-presence means the sensor doesn't detect occupants. So in this case, what should I choose for the goal of the response?

Mark_Bailey · Jan 29, 2021 10:08 AM

Thanks for your explanation.

I would treat the Location factor as a simple Categorical factor, not a blocking factor.

I would define the response in Custom Design to have no goal. It will be easier to define the desirability function later in the profiler.

A single trial (one yes or no result) means that you will need a lot of runs to estimate the logistic regression or the binomial GLM model well.

statman · Jan 29, 2021 10:28 AM

On this point I disagree with @Mark_Bailey. Location in the room is noise and as such it should be treated as noise in experimentation. You don't want to conclude location is significant and the people have to stand here for it to work. You won't sell may of those sensors or the ones you do sell will have many customer complaints (It only works when I stand here).

"All models are wrong, some are useful" G.E.P. Box

Discussions

Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Re: Could Screening design used for 1 continuous factor, 7 other categorical factors, and 1 categorical response?

Recommended Articles