Discussions

kPvDoE · Jun 8, 2023 2:05 PM

Dear JMP-Community,

for my master thesis I want to optimize the process for a hot beverage. I have 5 factors (X1 –X5), whereof one is a hard-to-change factor (X1) and 10 response variables (Y1-Y10). Responses are measured with a trained sensory panel (8 – 11 assessors).

I created a custom RSM design with 243 runs (basically the design has 27 runs (minimum number), but each assessor (let´s assume 9 assessors) scores all 27 runs to get a more accurate score (27*9=243)). Each assessor can evaluate 3 samples per whole plot (=day).

I considered including the temperature of the sample in the cup as an uncontrolled factor. However, it is measured after evaluation of the visual appearance and therefor may vary, depending on how much time a person takes to evaluate the appearance of the sample. The temperature in the cup could correlate with the process (e.g. high temperature in the cup -> hotter water was used during the process -> could affect overall flavor). Should I include it?

For the design evaluation, I was thinking to use the desirability function to optimize my responses simultaneously and obtain the optimal factor settings.

Now my questions:

I entered the factors as continuous, as I believe the estimates will be more precise then. Ultimately, however, I would like to hit at best an already defined factor level (e.g. the temperature in the process is a factor (87 - 95 °C; defined levels are 87, 89, 91, 93 and 95 °C) and by optimization I would prefer to hit one of these levels). Should I rather choose discrete factors, or just round the optimal settings to my factor levels (if appropriate)?
Is it correct that I generate a design with 243 runs? Because I think I want to consider the “replicates” not as replicates, but as repeated measures, to get a more precise result as I said. But I want to take the variability of the assessors into account (that´s why I don´t want to use the mean values).
How can I include the assessors in my design? I generated an extra column and included the term as a random effect in my model. Also, how do I define the design role and data type?
And do the assessors always have to be the same 9 people or can they vary from whole plot to whole plot? (Each assessor has an individual ID.)
Is the desirability function enough to build my model and optimize all responses simultaneously or should I use other predictive modelling methods (Bootstrap Forest, K Nearest Neighbor,…) to receive a good estimate of my model?
And I use a beverage reference to “calibrate” the panel (Assessors always drink and evaluate the reference first, and than the samples follow; the reference beverage is from a different process). Should I include the reference in my model? If so, how?

Thanks for your help and I appreciate any errors you find in my design/model!

PS: The attached file contains my previous design and for the fake evaluation I used some old sensory data (I haven´t started with my measurements yet).

statman · Jul 15, 2021 09:52 AM

@kPvDoE My first thought is, since this is part of your masters thesis, how much advice should we provide? I did not look at your design/model.

The multiple evaluations for each treatment should not be included in the creation of the design. These multiple evaluations are indeed repeats and are not independent of the treatments, and therefore are not additional degrees of freedom. The within treatment "ratings" should be evaluated before summarizing and then analyzing the treatment effects. The temperature factor poses a bit of a dilemma. There are two temperatures? One for preparing (or manufacturing) the beverage and the other for when the beverage is evaluated? It sounds like you are experimenting on the design factor at 5 levels (why, do you expect some 4th order polynomial effect? Or are you trying to "pick a winner"? The temperature of when the beverage is consumed could be a covariate.

For the within treatment assessment, I assume you are using a sensory ordinal scale? For ordinal scale evaluation, you should first evaluate the consistency of the evaluations over the 9 assessors and identify any potential outliers or bias. Plotting their response and looking for systematic patterns would be useful. Then, if appropriate, summarize those evaluations with a mean/median and some measure of dispersion (range or standard deviation). As far as reference goes, I suggest you have an actual sample of each category of the ordinal scale. The evaluation should be "which sample does the test sample match" vs. which do you prefer (as this involves personal bias). I have attached a simple paper I use to provide advice on sensory/ordinal scale usage.

"All models are wrong, some are useful" G.E.P. Box

kPvDoE · Jul 16, 2021 04:17 AM

Thanks for the fast reply @statman

I´m not a statistician and never heard of DOE before my master thesis and unfortunately I don´t have a supervisor who can guide me. That’s why I was looking for help here in the community ;)

There are indeed two temperatures as you described. For the production temperature, I don´t want to find 4th order polynomials. The 5 levels can be chosen in the machine and I try to “pick the best out of them”.

And for the consumption temperature: If I define it as a covariate, then the temperatures must be known in advance and will be added to my design. But I can´t ensure this temperature for the treatment. Or is that incidental?

Actually, I´m using a variation of the quantitative descriptive analysis method for my sensory evaluation, so I don´t think I have ordinal data. The assessors evaluate the intensities of the defined attributes using an unstructured line scale with two verbal anchor points at the ends. The total length of the scale is converted into numerical values between 0 (lowest intensity) and 100 (highest intensity) after the measurement. Therefore, the distance between the data is known and has a meaning. The reference samples are used to hopefully calibrate the intensity measurements of the assessors and get a more consistent evaluation.

But I think I can still use some advice from the paper. Or do you also have one for quantitative data or is the evaluation of consistency different there at all?

statman · Jul 16, 2021 10:11 AM

Based on your reply, I suggest you read some books on DOE, or perhaps look through some of the on-line reference material on DOE provided by JMP, for example:

https://www.jmp.com/en_us/events/getting-started-with-jmp/doe-intro-kit.html

1. Pardon my over simplification, but DOE is not a test. In experimentation you are looking for clues as to what is the causal structure related to each response variable. A test is done to pick a winner. This is why running an experiment, you would likely start with multiple factors set at 2 bold levels to assess the linear effect, screen out the less interesting factors and then continue to iterate (move the space and augment ) with the remaining factors. If experiment on multiple factors at 2 levels you can get estimates of linear main effects and interaction effects, depending on the resolution of the design. If you add levels, you can get estimates of non-linear effects (3 levels - quadratic, 4 levels - cubic, etc.). Typically, you build your models in a Taylor series approach where you start with linear Maine effects, then linear interaction effects, then quadratic, etc. (building the model from first - 2nd to 3rd...order). But, the intent is not to create a complex model, the intent is to keep the model as simple as possible while still having a model that works for prediction.

2. Covariates are a way of handling a factor that you cannot specifically control/manage, but can be measured. We incorporate this random variable into the model of fixed effects (from the DOE). Accounting for the covariate increases the precision of the experiment without compromising the inference space. Of course, you are now analyzing a mixed model.

3. I'm not familiar with the quantitative descriptive analysis method, but it sounds interesting. I am always concerned with human sensory perception as a response variable. According to my cursory look, the measurement is interval. How to get consistency and mitigate bias is challenging. I don't see how this method handles either of these issues. There are many research papers on the number of units in a scale that can be consistently used for human perception.

"All models are wrong, some are useful" G.E.P. Box

Discussions

How to create my model based on sensory data

Re: How to create my model based on sensory data

Re: How to create my model based on sensory data

Re: How to create my model based on sensory data

Recommended Articles