What is the Real Impact of Your DOE Factors: Automating Impact Ratio Calculation...

Fujifilm Diosynth Biotechnologies is a contract development and manufacturing organisation (CDMO) that has a dedicated Process Characterisation department focused on performing process characterisation studies (PCS). The aim of PCS is to demonstrate that our customer processes are robust to changes in parameter settings or their normal operating ranges.

PCS commonly employ design of experiments (DOE) to investigate the effects that process inputs have on quality attributes (QA) and process performance indicators (PPI). DOE analysis is a useful tool to identify the inputs that have an effect on the QA/PPIs, but it is mainly quantitative.

In addition to the tradititional DOE analysis, calculation of the impact ratio (IR) for each input provides a quantitative and qualitative assessment and can aid in the assignment of a parameter as being critical or not. The IR provides a measurement of the effect size relevant to an acceptable range.

Doing the calculations manually is time-consuming and prone to error. We will present an automation tool that can extract the required information from a DOE model and compute the IR. An interface allows the user to customise how the results are calculated.

Hello, everyone. I'm really happy to be here today with my colleague Sam. Today, we'll be talking about impact ratios. Basically, what is the impact of your DoE factors on your DoE outputs or responses? I'll introduce the concept and calculations and examples in JMP, and then I'll pass things on to Sam who will actually take you through automating that process using JSL and showing you how he's done this in add-in.

First of all, confession time. I really wanted to call this talk, What do Process Characterization Scientists Have in Common with NASA Scientists? But the link was too tenuous, and maybe it would have been a bit mystifying. Nonetheless, I still use the concepts here. This is why we have meteor pictures on the first slide. This is my metaphor. I'm looking at the impact craters, and I'm comparing it with impact ratios because we need to have a little bit of fun.

What is happening here? We have a very similar meteor, and it could fall on the Earth or Mars or the moon, and the conditions would be different. Here you have, first of all, an atmosphere or very little atmosphere or no atmosphere at all, which changes the blast, which changes the behavior of the meteor. If you are on the Earth, the impact crater that will be formed It could have a really massive impact on what's going on around because on Earth, we have people living around.

With this in mind, what are we going to go through today? We'll look at yet another statistical ratio. Statisticians like their ratios. Then we'll look at what the impact ratio specifically is measuring. Then we'll answer this question, "Can I calculate this in JMP?" The answer is, obviously, you can, and you can do that a number of ways. You can do it the painful I will take you through this manually. I will skip over this step because we don't have time, but we have created a workflow to make things a little bit easier with table management.

Then I'll pass things on to Sam, who will show you the happy way of calculating the impact ratio by clicking on his JMP add-in. Here we have a bad hand drawing that I did myself to show you something that you have probably seen many times because that's a control chart, and they are quite omnipresent in statisticians' world. Here we have an output, and it's changing over time as we create more batches, for example.

We have zones on this chart. In the bluish purple zone, we have where our output falls. Because this is normally distributed data, we can make a prediction about where it will fall as time goes past. Here we have statistics giving us a prediction of how wide this blue zone might become. Then in green between two specifications, we have a customer type of safe zone. Customers tend to give us specifications, or we might have calculated an acceptance criteria from developmental work, for example.

What we want to do is make sure that our process is sitting in a zone that's well is all contained by the safe zone when we are doing process control. One ratio that's used for this is the Capability Index, like the CPK, the PPK, is basically ratioing this zone versus this green zone. This is the most similar I found to impact ratio. Here we have almost the same graph. You could tell I have reused the same drawing. The only thing that changes here is that this time, instead of having statistics predicting the spread of the data, we have statistics predicting the shape of the data. We're modeling what happens to one specific output when you change the input, and you move its value from low to high.

This is what you would do if you were doing a DoE and you changed the operating condition systematically. More particularly for process characterization, this low to high would be a range that you want to prove is acceptable. Here we have an equation in the end, and we have a blue zone, the min to max predicted for our response, and we still have a green zone, what's acceptable to give us quality product at the end of the process.

We are checking, again, if the deviation is occupying an acceptable proportion of our safe or play area. This impact ratio can help us compare the impact of different factors because we will get such a model for all of our factors. It could also be a criterion for classifying those factors if this is part of a DoE. What is this impact ratio really measuring? You had a clue on the slide before, but we're back to meteorites.

Our NASA scientists have calculated two things. The place where our meteorite will impact the planet and also the radius of the impact crater. They have said, as long as we are well within this green safe area, whoever lives on the outside will be fine. Of course, that would not be true for meteorite, but just stay with me here for a second. What we're ratioing here is basically the radius of the crater to the radius of the safe area. We're hoping that this number is much smaller than this one. That the safe area is very big compared to the impact crater.

Now, what happens is that we could be off target so that the NASA scientists predicted that the meteorite would fall here, but it fell here. Even though the safe area was probably big enough for a centered meteorite/process, in this case here, we actually have the impact crater well outside of the safe area. I hope you don't live around here. Another possibility, and there are lots of other possibilities, would be we are on target here, but we have miscalculated the crater. The crater is much bigger than what we thought it would be. Again, it's outside of the safe area, and the radius of the safe area is actually smaller than that of the crater here. This is just a picture, and let's see how this would look on a graph. Same graph again with some extra arrows here and some equations.

In plain English, your impact ratio really is ratioing the effect size over the distance to your specification or acceptance criterion. The effect size here is those skinny blue and orange arrows in this case. It's the distance from the minimum prediction to the center or the maximum prediction to the center. We are ratioing this to the distance from the center to your specification on either side. In this case here, this distance occupies 70% of our safe zone. That is not a good impact ratio here. Here is only 40%, it's still a pretty high figure. But to be fair, we're going to have to take the maximum of those are two values, so we would consider 70% here.

This is my hand drawing. Let's go to an actual JMP chart. This is a jump chart, and it's just one part of a profiler that you would get at the end of finishing up a DoE analysis. In this case, we have a bit of curvature, but the zones are the same. If we want to calculate our impact ratio to the minimum here, the prediction is that minimum to center is taking about 42% of minimum to specification. On the other end, minimum to top or maximum prediction is only taking about 7% of our safe zone.

Nonetheless, we're going to have to report this number, so it's still a pretty high number here. The questions we're trying to answer here is, are we on target with a small impact compared to what is acceptable? Now, I will exit from here and go into JMP to show you how to carry this out manually, the painful way. Here we have a box standard DoE, five factors. We've only shared here four responses. It's all anonymized, so I'm very sorry, but the numbers look a bit funny because they're all between minus one and one.

But what we're looking for here is that we have response limits specified so that we can play around with the goal for those responses. Maximize for most of them for something like impurities that might be minimized, and we might not even be interested in how high they go. I'm not going to go through the modeling because that's not what we're here for. Sorry about that. Wrong script. There we go. We've already fitted this, and this is the example I had in the notes, so I'm going to keep with this one.

Here we are using the Profiler to get all of the numbers we need for our equation. We're setting this at center point, and we're going to record that number. There we go. That's the center point conditions. Everything is kept at center point here. I'm going to ask JMP to remember this setting. I'm going to call this Load Center Point because I'm interested in the load here. That's my first one. Then manually, I wouldn't really need to ask JMP for help here. Everything is at center point for those two. I could just move this here to the minimum and I can record this again as the minimum value. That's my load min here.

For the maximum, it's a little bit trickier. I could try to do this. I could make this bigger and try to make sure I get to the maximum here. But I'm going to trust JMP for this one. I'm going to click Control + Alt, and click on the other two graphs here, and I'm going to lock the factor settings at center point. Because I'm only interested in what the load is doing and what the maximum is for the load when everything else is kept at that point. Then I'm going to use the maximizer in JMP.

There we go. I could have reached that by hand, but now I know it's exactly at the maximum. I'm going to save this one as well. We have the max. Now I have almost everything I need. If you're doing this, manually, you're doing this for every factor or equation parameter on this list for every response. If you have fairly big models, and you have maybe 5–10 responses, This becomes a lot of work very quickly.

Then you would have to export all this data into table. You would use this if you had many done, but I only have one, so I'm going to do this. Here you have all the info you need in the wrong format. What we need to keep is where are these? Then we want this number here. Now, you would use transpose in tables here, but I'm not going to do because I've already done it ahead of time. I just want to quickly share that with you. Why is it not happy? Here we have the transpose data. We have all the labels from the table here.

Manually, we would have to add our lower and upper spec. From these and the min and max and the CP, we can calculate the numerator for the ratios here, the denominator for the ratios, and the ratio themselves. You have the small distance over the large distance give you your low impact ratio. Then you simply use formula to get the maximum of the two, and you'd have to repeat that for every one of them. I'll leave the floor to Sam now who could show you how much easier this is to do once you have an add-in for it. Sam?

Thanks, Gwen. As Gwen mentioned, we decided to create an add-in to automate this task. I'll just give a quick demonstration now of how the add-in works. I'm using the same data table that Gwen has just been working with. The reason why we decided to make an add-in was just because it makes it easier than running a script directly from a script window. The nice thing about having an add-in as well is when you hover over the add-in name, you can add a tool tip. In this case, the tool tip just tells you the correct window that the add-in has to be run from.

In fact, if I try clicking it, you'll see that nothing happens because I'm not in the right place. If I now open up the script with those models that we made earlier, you can see that I can now run the script and then it will run properly. First of all, the user is then presented with some windows which have some instructions and then later on some areas for user input. The first window just has some instructions around requirements for the underlying data table.

For example, the factor columns have to be coded, and any units have to be input as column properties as well. If that's not the case for any of the columns, you can then just hit Cancel, and I can go back and make those changes. But in this case, I know that they are all coded correctly. I'll hit run again and then click OK. Then the next window just has the area for inputting of the factor settings. You'll notice that we have an input here for categorical factors.

It's not possible to calculate an impact ratio for a categorical factor. However, any categorical factors contained within the models have to be fixed at a particular setting, so this just allows the user to input this here. For the remaining factors that were continuous factors that were evaluated in the DoE, you can see that we have an area for inputting those settings. You can see that the script is automatically read in the minimum and maximum value just by reading from the table. What it has done as well is calculated the center point just by looking at the middle between the minimum and maximum.

However, in some cases, the center point might not be at the exact center of the range. In this case, that was the case for load, so I'll change that to 20. It's possible to edit any of these values. The benefit of being able to change the input here is that if we run the adding, and we get impact ratios that are too high for some of the factors. We can then just run the adding again, and I'll change the values here, try evaluating a reduced factor range, for example. Then see if the impact ratio looks any better at that new range.

But I'll keep the rest of the values the same and click OK. Then the final window just has the input for the response acceptance criteria. I'll put those values in now. If there are only one criteria, then you can just put one in and leave the other blank. If you have no criteria, you can leave both boxes blank, and it will still be able to calculate the impact ratio. In that case, it's just the percentage difference between the set point response and the minimum or maximum prediction. It doesn't quite offer the same measurement in terms of practical significance, but it will still be calculated by the script.

Now, if I click OK, you can see that the summary table has been generated. But if I go back to the window where I ran the add-in from, you can see that underneath each prediction profiler, the settings have been remembered. These are the settings that were used to obtain the minimum and maximum prediction for each factor in the model. This is useful so that you can then go back and see how the calculations were made. It's good to be able to review that.

But now I'll just give a very general overview of how the script actually works. First of all, the script loops through each response model in the window. It then sets the desirability to maximize the response. The script then loops through each term in the model. For the first term, which is load in this case, it then unlocks the factor settings, so it can be free to move. Then all of the other factors are then locked at their set point setting. The script then executes the Maximize and Remember function. You can see now that that setting has now been saved underneath the Profiler, and you can see that we've maximized this response by just changing the load factor in this case.

It then continues that operation for each term in the model, and then the desirability is then set to minimize the response, and then that process is repeated again. Each term is then evaluated again to get the minimum prediction. Then finally, all of those remembered settings are then updated so that the name is meaningful. Then we show the factor name and whether the goal was to minimize or maximize the response. Then finally, this entire process is then repeated for each response model contained in the window. That's the script works. I'll just return now to the summary table that was output.

You can see here that all the data just gets collected and output into this table. We have each row contains one particular factor for each response model that was evaluated. We have columns for the acceptance criteria that were input by the user. There is then columns showing the prediction at the set point settings, and then the minimum and maximum prediction for each factor contained within that response model. Then the remaining columns are just formulas. Here we've got the difference between the minimum prediction and the set point prediction, and then, similarly, for the maximum prediction and the set point prediction.

Then the last two columns are just the calculated impact ratios. We have the lower impact ratio and upper impact ratio. You'll notice that where we only have one criteria, so for impurity in this case, since we only have an upper acceptance criteria, we only calculate an upper impact ratio. Then the final column is just the overall impact ratio. This is just simply the maximum of the lower and upper impact ratio. I'll just finish off this part of the talk by summarizing the benefits of using an add-in to automate this task and jump.

Firstly, using the add-in is much quicker than doing it manually. Secondly, this allows the task to be repeated much more easily. As I mentioned, if you run the add-in and get impact ratios that are considered to be too high, you can then just rerun the add-in, change the factor settings, and see if you can obtain an acceptable impact ratios. Then finally, there's much less chance of any errors occurring because you don't need to do any data transcription or manipulation of the data. The script collects all that data together and then puts it all into this table that gets output at the end.

Okay, thank you. That concludes my section of the talk. I'll hand over to Gwen to summarize things. Thank you.

Just in addition to what Sam said, a big advantage of the add-in is that you could repeat the calculations and change what you input in the first couple tables. You could change the factor settings if you wanted, for example, to bring in what you had set out to show where your proven acceptable range is, a little bit outside of your normal operating range is, presumably. The other thing you could change is revise your specification or acceptance criteria. If your impact ratios were really high, then your safe zone would have been maybe a bit too small to be comfortably operating in.

You could push the specifications out and check if all your impact ratios are being a bit smaller. You could actually probably use this as a justification for changing those acceptance criteria. Another thing that you can do with those impact ratio is that you could use them collectively as a criterion to classify your tested process parameters. It could be that a process parameter is critical, highly critical or non-critical at the end of the DoE because its impact is very small. I think that concludes what we had to say about impact ratios today. We're both available to answer questions if you have any. Thank you very much.

Presented At Discovery Summit Europe 2024

Presenters

Skill level

Intermediate

Beginner
Intermediate
Advanced

What is the Real Impact of Your DOE Factors: Automating Impact Ratio Calculations with JSL

Presenters

Skill level

Automation and Scripting

Design of Experiments