Comparing optimization strategies in complex multi-dimensional processes

There have been numerous studies showing efficacy of strategies in process optimization. The common comparisons made are usually between the ‘one factor at a time’ or OFAT experiments and a ‘design of experiments’ approach. When faced with an unfamiliar, high-dimensional process space (e.g. >10 factors), researchers often resort to the OFAT methods as they are easy to interpret. Generally, it would be cost-prohibitive and logistically challenging to run multiple experiments geared towards the same objective just to evaluate which strategy outperforms others. To circumvent these issues, we used a Polymerase Chain Reaction (PCR) simulator with 12 unfamiliar continuous and categorical factors to explore these questions.

Our team comes from decades of experience in process optimization in the electronic materials industry (former employees of Apple and others). We intentionally sought and selected a simulator from a research area completely unknown to us that has the ability to simulate a large number of factors and their complex interactions on many responses. To automate experimentation, we used a python web automation script. By using a simulator and our script, we can run through many experiments while mimicking real-life constraints and experimental budgets as seen in our own professional careers. While adhering to run budget rules, we compare the efficiency and accuracy of four strategies; two OFAT type strategies as commonly used in the industry, and two strategies from the DOE and advanced DOE genre. JMP is used for all experimental analyses and modeling and an objective attempt is made to compare the strategies.

Hello and welcome to this webinar on Comparing Optimization Strategies and Complex Multi-dimensional Processes. My name is Asit Rairkar and my collaborator Jeffrey Kelly. This will be for the 2024 JMP Discovery USA conference in October. This is our outline of our talk. We'll introduce ourselves. We'll discuss our motivation and approach to our work. We'll give a brief description of the methodology and background that we chose for the four approaches to process optimization that we took. We'll present our results and discussions and suggestions for future work, which is more an open-ended invite for people to discuss with us what else we can do here.

Again, my name is Asit Rairkar. I'm the President of Samaasa Consulting, which is a JMP partner for training and consulting in JMP and process optimization. I've worked for almost two decades in the Bay Area at Apple. I have had leadership roles at Apple, Applied Materials, Miasolé, a solar startup, and Novellus, which were later acquired by LAM research. I worked in process and product development in batteries, photovoltaics, semiconductors, mostly all in the area of thin films and mostly in electronic materials. I've been a JMP/DOE evangelist for 20-plus years. My love for DOE started when I took DOE courses under Dr. Montgomery, who's written the classic tech book at Arizona State, where I was a PhD student. We'd love for you to contact us. Both our LinkedIns are included here, and my emails also included asit@samaasa.com. We'd love for you to suggest critiques, comments, suggestions, improvements, or just discussions on any of this topic or anything else you'd love to discuss with us.

With that, I'll let Jeff introduce himself.

My name is Jeff Kelly. My background is in process engineering and batteries, semiconductors, mostly focused on thin film processes. I've worked at Switchback Systems, Apple, a small battery company called IPS, and Micron Technologies out in Boise, Idaho, where I started in semiconductors. That's where I started using JMP for DOEs and other analysis. I've been working with JMP for about 20 years. Started, I think, my first license I had was on JMP 4. Back to Asit to start the talk.

Thank you. We work together at Apple, and we've known each other for 10 years now, and I've collaborated and discussed on many topics similar to this and others. We love that we are working on this again together.

The motivation of our talk is very simple. This is a common topic that has been discussed many times, but we still encounter clients and teams in big and small companies that still want to do one factor at a time experiments, especially when they're dealing with a lot of factors, large number of factors, complex process spaces, and they find one factor at a time much more easier to explain and execute. Another motivation of this that the OFAT versus DOE comparison is always done or presented for a small factor set. It's very hard to do this comparison and present for a large factor set. Also, not many different variations of OFAT and DOE are done. Lastly, the biggest motivation for this is this is the comparison that I've always wanted to do throughout my career. But as you can imagine, if I had asked any exec team in any of my companies, hey, can you give me four times the budget to do four different approaches to solve the same different problem, purely for an academic exercise, nobody would have given us that budget.

We really wanted to do an objective comparison of OFAT versus a modern DOEs, especially where we are dealing with a complex non-monotonic process like it's shown in the right-side picture where you have lots of complex ridges and valleys. That's where if domain expert tell you that the process may look like this, engineers and scientists are leery of DOE and resort to one factor at a time experimentation and are very cautious. That's the motivation. With that, I'll again hand it over to Jeff to explain our approach and walk us through our OFAT approaches, and then I'll take over for DOE. Thank you.

Since doing this in real life would be cost prohibitive, we obviously needed a simulator, and so we ended up finding this PCR simulator online. It's web-based, and we were able to use that to compare the four strategies that we were using for DOE. It takes in 11 continuous variables and one categorical factor. I'm sorry, 11 continuous factors, one categorical factor, and then it produces two continuous responses. For us, this is an absolute black box. We are not subject matter expertise in PCR, don't know the technology, don't know the even chemistry behind it all that well. It gave us a good opportunity to use a lot of discipline in DOE and not mess with the numbers as we were putting them in to get a response that we wanted. On top of that, we also coded the inputs and the outputs. Analysis in JMP was a little more black boxy, giving us the results, the test that we want to see to compare the different DOE strategies. To know it is nice when going into these things, if doing this in the real world, that you do have subject matter expertise in any of those areas you'd want to run test on.

It allows for a lot of guidance and experimental design and will help you make better decisions. But in this case, I think it offered a good comparison. Next slide.

We did want to note how this is replicated in the real world and how our simulator either reflects or is opposed to that. One of the main reasons we picked a simulator is the limited run budget that we see in the real world. If you were to do a full factorial design on 12 factors like our simulator had, it would be over 4,000 runs. Obviously, even the simulator, that would be difficult with automation. It would take a long time to run. But we limited our experimental implementation to 30 runs for each type of DOE. That reflects what might be a budget in the real world. The next real world scenario that we run into is that categorical factors are sometimes avoided or done upfront by engineers in a process space, you might pick, decide on a manufacturer of a chemical before you optimize for that chemical. Some people avoid these. It just seems to complicate DOEs in a lot of ways. In that case, for our OFAT designs, one factor at a time, we selected these first and just ran them up front.

The other thing with the OFAT design is it's arbitrary in which order you pick your inputs to run, so we chose the categorical factor. In the real world, you also have factor constraints. Those are built into your process or your equipment, your substrates, your vessel size, things like that. In our simulator, those are given by the simulator itself. They're related to the volume of chemicals contained within the PCR and through some experimentation, we were able to find the equation for the factor constraint. That limited the range of the parameters that we could use or the parameters or input parameters.

One thing we run into in the real world as all is the experiment time versus metrology. It may take an hour or a day to run an experiment and a week to get metrology on that based on priorities of the lab that you're sending it to. With the simulator, this changes things. It gives us results immediately, and then we can take that data and analyze it nearly immediately in JMP as well. Things are sped along a little bit there. It could be opposite of that too, so those are factors we considered.

Failed runs. In the real world, you have failed runs. You have missed runs. It could be power outages, it could be down equipment time. Technicians go on vacation and put something wrong. Anything can happen in the real world. In this case, the missed experiments, anything were missed, we were able to backfill use automation. That's talked about in the next bullet point there, where we had recipe management. In a real world, you may have recipe management available to you or not. Ideally, what you would have is a system where you could input values, and those values would be downloaded directly to the machine and run your experiments exactly how you would program it. In our case, we were able to emulate that using a Python script, using a platform called Selenium, and automate all of our inputs. This mimics our recipe management for our purposes. In the real world, you could also have extreme settings that affect equipment performance. Something like in a power supply for a spotter system, if you're running at the highest and lowest settings of those, it's going to skew your results.

You're not going to get good performance. When you're running tests on systems like that and many other systems, you avoid those upper and lower limits of your process equipment. We avoided that in our simulator system as well, and those are factor limits are given in the appendix slides for similar reasons.

Then also the issue of noise in the real world. Anything can introduce noise into any process system, man, machine, method type of things. Simulator itself has built-in noise. Within the programming itself, we discovered that even given the same input factors, it'd be a small percentage of air or noise output. We got to experience that as well. Next slide.

We used four different type of experimental designs. The first one, the first... One factor at a time is a sequential one factor at a time. We ran different settings, changing each one at a time as we moved along. The second one, OFAT2, one factor at a time, two. We grouped the combinations of factors and then optimize at the end with a new baseline. DSD was definitive screening design and space filling design.

We'll cover OFAT1, go into a little more detail on that. Like I said, we tweak one parameter at a time and set each one to an optimum for that setting and continue the same strategy. If you look to the right, there's a table there. It's a hypothetical experiment that we put together. These are not real results. This is purely hypothetical. For the first experiment, we're looking at X4 for the first run of the experiment. We're looking at X4, and we compare L1 to L2, and we find L1 has a higher value. We're optimizing for Y. We carry that L1 value down to our comparison between X of X1 plus and minus, and we get an optimum setting of 110, leaving that setting as plus for X1 and L1 for X4. We take a look at X2 and compare settings plus and minus. Now, this one's interesting because the optimum setting for X2 happened to be the zero setting, so we already ran that back in the previous comparison, so that's 110.

We set that X2 to 0 and compare plus and minus of X3 and find the minus setting of X3 is 115 at the optimum. Set all those to the for a new baseline verification run and get an output of 120, which is our high optimum for this set of runs.

We move on to OFAT 2. In a similar way, we're doing this one factor at time, but we're going to group them. We tweak factors one at a time around a center baseline and set a subgroup factors to their optimum to create a newer baseline and continue the same strategy with other groups in the next subgroups. If you look at the chart on the right, like I said, another hypothetical experiment, we're going to look at X1 and X4 in the first group. But we're going to start with X4. Look at L1 and L2 as categorical factors for X4. Get a high of 105, then reset to our baseline for the comparison of plus and minus of X1, and find the minus of that to be optimum. We establish that new baseline using the minus and the L2 for X1 and X4. Test that and then get an output of 108. That's our new baseline. Then we continue on to the test for the next subgroup, we compare X2 plus and minus.

We get a high optimum of 115. Reset to baseline, run X3 as a comparison between plus and minus, get a high of 114. For now, we have X2 and X3 being positive as our optimums for each of those factors, and we carry that into a new baseline, our optimum and preferred setting for all the factors as minus-plus-plus-L2 at a high of 125 for the output. Then, like I said, you're going to go over just to finish screen design and SFD next.

Thank you, Jeff. Those were our approaches to the one factor at a time, as both of us have encountered in our career lifetimes and seeing other groups use them. We also used modern DOEs. We did two approaches, definitive screening design... Definitive screening design was an advancement on saturated designs, introduced by Bradley Jones of SAS, JMP's parent company, in 2011, and they're quite powerful for how lean they are compared to classic saturated designs like Plackett–Burman. They are very versatile, especially for large number of factors and mixed factors, continuous and categorical. Their biggest advantage is that they are able to extract main effects without confounding, and they also have potential to extract some two-factor interactions and some second-order curvature even with a lean experimental budget. They are especially powerful when coupled with subject matter expertise, where you know which factors are going to have quadratic or two-factor interactions, you can prioritize them. They're also available to sequential experimentation, where you can once read out non-significant factors and then build more experimentation to further and further optimize your process.

We also experimented with space filling designs, which are newer designs as well. I want a caveat here that this is only a 30-minute talk, so we don't have time to go into the nitty-gritty of how DSD and space filling designs are set up and how they work under the hood and also don't have time to go into deep analysis. We have recorded a demo, and it's available on our website that goes through how the designs are set up, how you run the experiment with the virtual PCR simulator. We got our data and what analysis we did. There's also plenty of website or video available on community.jmp.com on how DSDs and SFDs work. But if you have any questions about this or how these work, or you just want to have discussions or critiques or feedback, please write to us and we'd love to discuss these. Again, the demos are available for both DSD and SFD on our website, on this link. Going into space filling designs, these are a special category of designs where the computer algorithm basically creates points in the 12-factor hyperparameter space, and the points are scattered in a pseudo-random fashion such that each point is spread out as further away from other points as possible so that the entire space is filled.

They are a little complicated to explain and understand, and that's why most people avoid space filling designs, but coupled with modern analytic methods and modern modeling techniques, neural nets, machine learning, they're really, really powerful. The one trade-off for space filling design is that without a recipe management software, these are extremely complicated to execute. For example, if your experimental design team is different than your experimental execution team, they may be really confused with this scattered matrix of design points, and they may enter wrong values, and they may forget to delete values from the previous factor combination settings from the previous experimental run, and those will get carried over, and you will get runs that are not intended in your design matrix. So, if you're running space filling designs, the recipe management software is definitely recommended. These are the two types of DOEs that we compared to the two types of OFATs. With that, I'll hand over again to Jeff to discuss our raw results. Here's the raw results.

I think this is where the results start to be fun. But we have... On the right-hand side, if you take a look at that graph there, we're comparing two OFATs on the left, so those in the red and blue, to the two DOE methods, more modern DOE methods on the right in purple and green. On the X-axis or bottom of the graph, we have our run numbers, and on the Y-axis, and the responses are Y1 and Y2. Those are what we're optimizing for. If you take a look at those quickly, we can see that as we move along in the run number, that both OFATs tend to increase as we're carrying those optimized values forward to the next run. We see that increase in our output or our desired optimum there. We compared that to the DOE methods, this is what I thought was interesting, is that they look pretty chaotic, and we don't look like we're making any progress, even though we've had some higher values, it looks like we're not doing much learning here.

This is the case of where if you had an executive who midway through your experiment wanted to see some results so that they could make some decisions, they're not going to be happy with what they see in the DOE being performed. It's going to look fairly chaotic where the OFAT, even though we're not reaching as high as optimums, then they can see progress. I think the benefit we get out of DOE, or the big benefit we got out of it, is that it gives us a real insight into what's happening in a process. We can build a model based on that. That's not something you get of OFAT, you run to an optimum, and that's where you put your setting. With a DOE, you get that model, mathematical equations. You can input values you want to design an optimum and know where you are. It gives us more view into more complex process spaces and more understanding of what's actually happening.

With that, we'll jump into the results of the DSD. Again, we don't have time to go into a whole demo of DSD and SFD, and that demo is recorded on our website, so it's linked for your convenience. The DSD, again, like Jeff said, the advantage of DOE and SFD, the DOEs, is that you get a model equation for your entire process space that gives you a better understanding of how your process space is actually working. The DSD design with JMP gives an automated analysis script that once you enter the data, you can just click the analysis script like it's been shown in the demo. At the first stage, the DSD reads out any non-important main effects. In this case, you can see for Y1, X1 and X3 and X4 are weeded out, and so some others as well, whereas in Y2, X3, X4 are weeded out.

You can also see that the DSD gets you a profiler, which is like a predictive equation that gives you an estimate of the main effects for Y1 and some main effect as you can see, also some quadratic effects in some factors, especially X2. You get a model equation, a reduced model equation, which leaves out the non-significant effects and gives you an equation in the significant effects and their main effects, some two-factor interactions, and some second-order curvature, which you can then use along with the built-in desirability functions and maximize desirability, which takes your predicted outputs in Y1 and Y2 to the level that you want. In this case, we want to maximize Y1 and Y2 with Y1 treated with more importance. So, the DSD modeler gives you a set of factor settings that the model equation thinks will give you the best results in Y1 and Y2. We use that in our round one to enter back into the simulator and run validation experiments. Beyond that, in future work, we'll build on sequential experiments and optimize even beyond that level.

Same with SFD. The SFD, as I said, it's a complicated design with points scattered all throughout the 12 parameter hyperparameter space. Again, JMP gives you a built-in script to analyze the model. Again, the demo is included on our website. You're welcome to see how the DOEs were set up, how data was collected, and how it was analyzed. It doesn't go into the inner workings of how our DSDs and SFDs work. If you need discussions on that, please reach out. Again, JMP gives you a built-in script. You can run it, and the SFD Gaussian process model gives you a very complex non-monotonic process profiler surface in the 12 parameters. You can see some factors do not seem to have a significant effect on the Y1 and Y2 responses, but also SFD has the ability to give you a prediction profiler that's non-monotonic and is able to explain some of those ridges and valleys that simple DSD cannot. The SFD model tends to be more predictive than explanatory, as in it does not give much emphasis into what main effects are or what quadratic effects are.

It gives you an overall 12 parameter prediction equation, which, again, you can use JMP's built-in desirability functions to maximize desirability. That gives you a setting, a combination in X1 through X12 that the model equation thinks that will give you the best result in Y1 and Y2. We use those factored equations, we input it back into the simulator or your experimental design equipment and run your validation runs and compare to the DOE and each other or the OFAT and each other.

That's what we did. We have four approaches, and we ran three validation runs for the final predicted OFAT results or the predicted best desirability outcomes of Y1 and Y2. And as I said, we treated Y1 is softly more important than Y2 just for guidance. You can see that the two OFATs on the left are only able to maximize your Y1 to a 100-250 range and Y2 to a 97-98 range. Whereas the DOEs are able to come up with a better process optimum that takes Y1 to 350-plus range and Y2 to near 99 range. You can see that there's some variability that's inbuilt into the noise as well. Again, this is a known conclusion that DOE experts have, evangelists have doubted again and again, that DOEs are much better and efficient at dealing with your process optimization problem compared to OFATs, even in the case of lots of factors and a very complicated process space.

But the three main advantage that DOEs, again, offer is one, they give you a prediction equation or a model for the process space. When coupled with subject matter expertise, they are even more powerful because you can use the subject matter expertise to guide the experimental design, and then you can use the output of the subject matter expertise. You can use the output analysis of the DOEs and couple it with your subject matter expertise to really understand what the science is telling you, what the inner workings of the process system is doing, and even design better experiments and optimize even further. Thirdly, again, they are very amenable to future experimentation compared to OFAT, because in the OFAT, you're only going to keep on tweaking one factor at a time and make gradual improvements that may not give you the best signal and will definitely not give you an understanding of a comprehensive process space how it looks like. Again, we have concluded that DOEs are better than OFAT. This is not a new conclusion, especially for the greater JMP community, especially DOE experts and JMP evangelists.

But we did this for a complex 12-factor system, and we did it comprehensively across four different approaches.

I'm very happy with these results, and we'd love to have further discussions, which takes us to future work. Again, this is just a bullet point suggestions for us, but we'd love to hear from our audience. Please write to us with critiques, feedbacks, comments, suggestions, discussions. If we continue this work, and we have the bandwidth to do this, we would love to expand our run budgets and do more rounds of 30 runs, which is typically what we have seen in our own corporate lifetime. We'd also expand the factor settings based on the guidance from the process models and augment designs and include more runs to decouple some interactions and even optimize further. We would do more analysis methods. We could use neural networks. We could use self-validating ensemble modeling and Bayesian optimization, especially the latter we are not that familiar with. So, we are also students of the trade. We would love to learn from this example. We would also like to develop more simulators. Simulators give a great advantage to learners because experimental budgets are not always easy to come by.

Lastly, we'd love to integrate Python scripts directly into JMP so that classes or students can directly design DOEs, run them through to a simulator automation, get the data fast, and analyze and compare optimization strategies then and there. Again, this is a bullet point for future work discussions. We would love to hear from you. Again, please reach us through our LinkedIn or email me at asit@samaasa.com.

With that, we really would love to thank everyone who helped with this work. We especially thank Phelan & Associates for creating the virtual PCR simulator. Their references are in our reference list. We reached out to them, but we weren't able to contact them. If there is any collaborative work we can do with this, we would love to hear. Thank you to Jackie and Kirsten for enabling this recording. Thank you to the entire JMP community for setting up the JMP conference. We're immensely grateful. Thank you very much.

Presented At Discovery Summit 2024

Presenters

Skill level

Advanced

Beginner
Intermediate
Advanced

Comparing optimization strategies in complex multi-dimensional processes

Presenters

Skill level

Files

Basic Data Analysis and Modeling

Design of Experiments

Quality and Process Engineering