In the fab environment, wafer value is at a premium, and process innovation must be achieved with minimal wafer risk and resources. When working to improve or change a process outcome (e.g., reduce roughness or achieve a target etch depth), the experimentation process typically requires approval for a predetermined number of wafers and amount of time. When placed under such tight constraints, what is the most efficient approach? How can a process engineer take advantage of not only their subject matter expertise but also their historical data? Enter Bayesian optimization (a.k.a. sequential learning or active learning).
Bayesian optimization allows for an iterative and intelligent approach for identifying the best possible factor combinations to achieve a desired outcome (e.g., maximize yield, reduce defects, etc.). In this paper, we use a semiconductor process engineering example where we analyze historic data to iteratively improve factor settings to achieve a new or improved outcome. We show that whether there is minimal or months of historic data, Bayesian optimization will provide a series of parameter values to test, and with each result, improve the desired outcome. When wafers are at a premium and process change needs to be achieved accurately and with minimal wafer waste, Bayesian optimization can vastly reduce time and waste.

Hello. Welcome to our discovery talk titled Bayesian Optimization: A Goal-Oriented Approach to Process Development. Thanks for taking the time to join us. Today, we're going to be covering this new platform that's recently been released in JMP 19. We're going to start by just walking you through what Bayesian optimization is, how it works within the platform, so the what and the how. Then we want to really highlight, because this is a very important reason for the value of this tool, is how it is uniquely powerful in JMP.
We'll then move into when is Bayesian optimization an optimal tool to use versus maybe something more traditional. Then we'll go through two different case studies increasing complexity to highlight the value of Bayesian optimization, and then we'll circle back towards the end and highlight the value specifically for the semiconductor industry, and then we'll wrap up by talking about how best to implement this into more of a traditional semiconductor workflow environment in process development. Thank you for being here.
Thanks, Peter. I will give a quick overview of what Bayesian optimization is and a little bit about how the algorithm works. Bayesian optimization is an iterative optimization algorithm for finding the best combination of factor settings to satisfy your response goals. It's generative, so it's a learning technique that uses Gaussian stochastic process models as a base to generate new, what we call candidate runs, which are new settings of your factor combinations to test out.
The basic workflow of Bayesian optimization is that you start with some existing data, and this existing data could be a starting DOE design, or you could also start with some historical data that you have sitting out on a database.
From there, you take that data, you construct your initial Gaussian process model, and then you use the BayesOpt algorithm based on the model predictions, the prediction uncertainty, and your response goals to generate those new candidate sets to test. Once you test a new candidate run, you then update the model with the new response results, and that sequence repeats itself until your optimal factor settings are discovered.
On this flowchart diagram, Determine Next Runs is where the Bayesian optimization magic happens. This is done through a guided search by balancing both exploration of the process space and exploitation of the model that you've created.
Why is Bayesian optimization so innovative? Well, it accounts for the responses with each iteration, providing clear guidance on when you can stop experimenting. It's a very goal-driven methodology. It goes beyond just your inputs, and it considers your outputs and your business goals.
This combines a sequential learning algorithm with JMP's profiler and desirability functions, and it creates a very versatile new approach to optimization. As Peter mentioned previously, the platform in JMP is quite unique because it is built on top of JMP's prolific profiler functionality.
That means that this will work with multiple responses that might have competing goals. Maximizing, minimizing, or matching a target. Also can handle categorical inputs, factor constraints, potentially messy training data with missing inputs. Very powerful, the platform in JMP.
Why is this powerful for semiconductor manufacturing? Well, it's an iterative "filling in the knowledge gaps" type of approach, so it's a natural fit in semiconductor process development. BayesOpt, as we're calling it, will also point to the holes in your process space that might be opportune for you to explore. It also makes very critical and efficient use of available historic data so that that data that you just have sitting out housed on databases can actually be utilized.
A little bit about how the Bayesian optimization process works. I will first take a moment to briefly describe the prediction profiler in case you're unfamiliar with that, because the prediction profiler is a very critical element of the Bayesian optimization platform in JMP.
The prediction profiler in JMP is a fully interactive illustration of a model. You can see an example screenshot of the prediction profiler here. You can manipulate the sliders in the window here, and you can change factor values, and then you can immediately see your updated model predictions.
What Bayesian optimization does is it uses your model predictions from your Gaussian process model, which is the base model that you've built. It also takes the prediction uncertainty and incorporates your response goals, and it gives you new factor combinations to test out. Now I will pass it back to Peter to give you a little more detail about how the algorithm works.
We're going to look at just a couple of images from the platform itself, and then in the case studies, we'll actually get into the platform and use it in real time. But when you go into the Bayesian optimization platform, there's two approaches. I'll briefly talk about the more custom approach. We're not going to be demonstrating that today, but all of the custom tools are called upon or can be called upon in the auto mode, which is what we're going to be highlighting.
You can see, in the lower right, we have these profiler shortcuts. These are different methods by which Bayesian optimization may deploy in order to find the optimal next candidate set. Let's say, for example, your goal isn't necessarily to find the optimal response goals, but maybe you specifically want to fill in knowledge gaps.
Well, you could say, for example, use maximize Bayesian desirability standard deviation, and it's going to shift around that candidate set to identify the areas where your standard deviation is high, and then develop candidate sets to narrow those regions down and gain new insights. In that platform, you have full control over the approach that the Bayesian optimization platform takes, if you so desire.
We're going to focus on the automatic mode. The auto mode, it's going to pull from this menu of profiler shortcuts to achieve the goal of maximizing your response goals. I'm using maximizing loosely here. You may be specifically maximizing, minimizing, or trying to match target. Whatever those response goals are, it's going to pull out these menu items to achieve that end.
This is a much more simplified approach because it's fully automated. There's three categories that you're going to march through depending on the state of your initial dataset and that initial model. If the initial model is not good, you're going to start in Space Filling Exploration.
The platform's going to identify candidates that's helped fill out its knowledge base until it can build a Gaussian model that is sufficient to move forward. Once you do that, you're going to be refining your model. Once you achieve what the platform identifies as the best model to achieve your response goals, it's going to move into a confirm and challenge phase.
What do those different phases look like? I'm going to march through four of these images on this slide and on the next, just highlighting really the change in the prediction profiler uncertainty and what stage of the model building we are in.
On the left here, we are in Space Filling mode. If we look at the prediction profiler, we see large, thick gray areas, that's our uncertainty, with little peaks, around 66, 8, and 106. At this point, our Gaussian process model is insufficient to move forward, and so we go into this MaxPro Space Filling regime where we're going to start just building out that model.
Once we have a sufficient model, and that is based off an R-square by default of 0.25, that can be altered if you would like, we go into refine model phase. This can call upon several different parts of that menu that I showed previously. But in this example, we're looking for max expected improvement. We can see now our red sliders are where the uncertainty is minimal, and the response is maximal, because this is built off of our desirability.
Once we achieve, we continue through this max expected improvement, we're going to continue to refine that model, and we're going to move into, say, a max desirability role where we're really trying to get the best model possible. We filled it out, we've probed that uncertain region, and now we think we're honing in on the best possible model. Once we achieve that model, the platform is going to confirm and then challenge that. It wants to be sure that it's found the best model, and so we go into this replicate best training run mode.
Moving on from there, when would you want to use the Bayesian optimization platform? Now this is key. It is not the tool to end all other tools. Here's just a few examples. This is an incomplete list, but I want to just highlight some opportunities where this may be beneficial. Let's get designed experiment and many of your runs fail due to say equipment failure.
Well, you might be able to use the Bayesian optimization platform to save that DOE. Then we talked about, Sarah mentioned, using historic data. Our two examples use cases we're going to show today are going to focus on these first two. But then there's lots of other benefits when you're time or resource constrained, because the Bayesian optimization platform would tell you when you've achieved your end.
Maybe a classical design is not ideal. There's too many factors. You don't know what the model form is or should be, or there's just too many runs to characterize. In the example I'll show, building a response surface model in the DOE platform would take over 70 runs. It's just too inefficient for innovation.
Then there may be instances where one run at a time makes more sense, like a batch process scenario. Then there also might be a situation where a standard least squares model will be too rigid. In all of these situations, Bayesian optimization may be the best step forward to achieve an end in minimal time.
Thank you, Peter. I am going to go ahead and go through our first case study. In this first case study, we're going to be looking at a CMP, which is chemical mechanical planarization in the semiconductor industry, a CMP screening experiment that has gone bad. This is a relatively simple example, and then Peter's next case study will move us into a more complicated example.
In this example, we just have one response, which is a film removal rate, and we're wanting to match a target to optimize our film removal rate. We have three factors in this experiment, slurry flow rate in milliliters per minute, the downforce being applied to the wafer in PSI, and the platen rotations per minute.
The goal here, again, is to salvage this DOE. In this case study, we've started out with nine initial DOE runs, and we've lost five of those. Let me go ahead now and just bring up JMP because we'll do this live inside of JMP. I'm going to start with just a setup to show you a few iterations of the algorithm and the platform, and then I'll show you the full final result.
The purpose here is just to kind of introduce you to the platform, show you a little bit of how the auto mode in the Bayesian optimization platform works, and then I'll show you the final result of how we salvaged this DOE. Again, we started with nine runs, eight factorial design points and then one center point run. Due to tool faults and measurement issues, we lost five of those runs.
Just to give you an idea of the hole that we're starting in here, let's just quickly do a little scatter plot of the factor space that we're working in here and where the four runs are that we have actual viable data from. If you look at this process space, we have data from these four corner points here, and that's all we have left. That's all we have to work with to start.
There's a huge area of unexplored process space that we don't have any idea what's going on, and we want to be able to match our film removal target in the least amount of runs possible, trying to work with what little data that we have left. Let's go ahead and enter the platform from the analyze menu, specialized modeling, and it's under Bayesian optimization.
I'm going to take my factors, my three x variables, put them in the x bucket, and my film removal rate that I'm trying to optimize in my y bucket. Now, I'm just going to do one candidate run at a time. Peter, in his case study, will show you how you can alter this and potentially do as many runs as you want at a time. Also, if you open up the advanced batch options, you can see where you're able to augment that R-square value that Peter was talking about so that the Gaussian process model kicks in maybe faster or not so soon.
0.25 is the current default. While that may seem very low, keep in mind that the model being able to make the best predictions at this point isn't necessarily the goal. We just need the model to be able to tell us if we're getting hotter or colder.
I'm going to go ahead and click okay. To start, the first thing we're going to look at is that R-square value of our Gaussian process model. You can see not starting off too great in terms of the model fit, 0.142. Right now, we're below the R-square threshold. We are going to be in Space Filling mode where we're just filling in holes in that process space because we had a lot of process space that was unexplored.
If we go to our batch selection tab, we can see that the candidate run that we're going to be adding as our next potential training run is going to be under the Space Filling regime. A couple of things under Include Options that you might want to consider. I'm going to uncheck the save desirability values just because I don't need those at this point.
But one thing that you might like to include is the predicted response value so that you can see what your Gaussian process model is predicting. I'm going to go ahead and hit Make Table. It's going to add the next candidate run. Here you can see this combination of factor settings is what it selected as the next viable run, and it is Space Filling. Right now, the Gaussian process model is not being fully employed yet because we haven't met that R-square threshold.
I want to keep going. Note here that I've set this up to automatically input the next response value. In real life here, what would have happened is you would have gone and actually ran this run of your process, collected your response data, your film removal rate, and entered that here. Now we move to the next iteration.
What's nice is that we don't actually have to go back into the platform. The platform creates these scripts for us here where we can just quickly get our next run in auto mode by running that script. We can now see, with that run, we're getting a little better. We're almost to our R-square threshold. We might be out of Space Filling mode soon.
You can see we're starting to get a little action down here going on in our profilers. Let's go ahead and make our next run. I'm just going to hit Make Table. Still in Space Filling mode because we haven't reached that R-square threshold. Let's go ahead and keep going and do another run. Make Table. Still Space Filling, so let's see if we get out of Space Filling mode in our next run.
We finally are going to make it out of Space Filling. We've now got a good enough Gaussian process model that we can kick it into potentially the refine model phase, or depending on how good it is, we might even jump straight to the confirm and challenge phase. Let's see where we are.
It looks like it's going to jump us straight to the confirm and challenge phase. We've hit Replicate Best Training Run, which means it hasn't found a candidate run that is any better than something it's already seen on the data table. In theory, you could potentially stop here. The predicted film removal rate is about 2391. By the way, the target for our film removal rate was 2400 angstroms per minute.
I also want to point out that because Bayesian optimization is a highly goal-oriented approach, it's very, very critical that you set your response limits before you actually start performing the optimization algorithm. Where we do that is in the column info of your response variable. You can see here I have a response limits property that is set up, and maybe I'll just quickly remove that and start it from scratch, so you can see where that's done.
In the column properties, I want to go into response limits, and this is where I can set my response goals. What you can see here is that I'm matching a target, and all I actually have to set are my lower and upper response limits, and then the platform will infer the rest. It will infer that the target I want to match is the midpoint of this range, which would be 2400 angstroms.
Peter will talk a little bit more about setting the response goals for when you're minimizing or maximizing because he will be doing a multiple response scenario.
Now we've seen a few iterations. This is one possible iteration of the algorithm. I could actually continue here because it's possible that it might, even after replicate best training run, it might go into the confirm and challenge phase even further, and it might even find me better solutions near this location on the response surface depending on what that response surface actually looks like.
But in the interest of time, I'm just going to show you sort of the final result of one of the Bayesian optimization runs here. In this case, we stayed in Space Filling for four runs, and then we went into refine model and then confirm and challenge, and this gave us a best training run of about 70 milliliters per minute for our slurry flow rate, around 7 PSI for downforce, and about 105 RPMs for our platen rotation.
This took us six additional runs of our process. Maybe you're thinking, well, that feels like a lot, but remember where we started. Remember that 3D scatter plot of the four runs that we were left with after five of our DOE runs were unviable. This is really quite efficient to be able to optimize a response in this few runs. Now I'm going to go ahead and pass it back to Peter for his case study.
We're going to be focusing on gate stack characterization. What's unique about this one is we're going to have a competing response goals. We're going to maximize one Y, we're going to minimize another Y, and then we're to try and hit a target. But further, I've taken a screen grab of the historic data I'm pulling from.
For our third Y, which is electrical performance, we're not only trying to hit a target, we're trying to hit a target where we don't have any historic data that is achieved to that particular target. We're going to build this using six continuous X's and two categoricals. Our goal is to overall improve the process control for this particular process, but also then hit this unique target.
I referred to this earlier, but to build a response surface model looking at all of this factor space would take 75 runs. We were able to achieve the desired outcome in 11... I did this several times, by 10 to 13 overall Bayesian optimization runs. Let's go ahead and pull up the initial data table.
A little backstory on this data. This data has all been anonymized, so the values don't quite make sense for the units. Hopefully, you can see past that. I pulled this from a historic dataset where I have well over a thousand rows. What I did is I built some pretty accurate models to predict my responses, and then I just selected five rows at random.
The reason I did that is I want to highlight how little data you can use to get started. Based off of the candidate factor space, I'll use those models built off of the entire dataset to then predict what the likely output would have been, the likely response.
Again, Sarah highlighted this, but when we're looking at say maximizing or minimizing, all I entered was the lower target. I entered 0.93, and then when you go into the Bayesian optimization platform, if you leave the default selections, it's going to populate this with everything else that you need.
When you're maximizing, you enter the lowest acceptable value. When you are minimizing, you enter the maximum acceptable value. Then again, as Sarah highlighted, when you're trying to match target, you just need to enter the min and the max, and then it's going to infer what the target is.
We're going to jump right into the platform. Let me pull in my desired responses. We're going to pull in our factors. As Sarah alluded, you can go one at a time, but you can also go 2, 3, 4, 5. There may be certain situations where running more than one run at a time makes more sense.
If we do that, using that Gaussian process model, we have a great model right out of the gate for measuring our defect density. We have an adequate model for electrical performance, but we have a poor model for our yield. I'm going to go into that batch selection.
Earlier I referred to the ability to custom control the flow of the Bayesian optimization. That would be in this menu area right here. I would just expand this, and then I can have all those options. Similar to Sarah, I'm going to go ahead and go in here and turn on my predicted responses.
Then this checkbox right here, Sarah unchecked that. When you leave that checked, that's what will populate the remainder of the response goals in your column headers. It's completely optional. When we go ahead and run two at a time, we can see they're both in MaxPro Space Filling, and then because those defaults are now encapsulated entirely within that script, every time I run, it's going to use two runs at a time.
After we've done this one time, we can see now all of our models are outside of that or have achieved the threshold of 0.25. Now what you'll see very frequently is when you do more than two runs, you're going to see other of those menu items be pulled in. Not only are we going to try and find the max expected improvement, but we're then going to also do another run in a space that is least understood.
We can go ahead and run those and then continue this, and I'll do this maybe one or two more times and just see how far we get, and then we'll jump over to the final results. Again, we can see we're oscillating between this max expected improvement and max desirability standard deviation, and we can continue in this process for some time. But let's jump over. I ran this earlier, and in this instance, I did one iteration at a time.
You can see we stay in MaxPro Space Filling for about three runs and then Max Expected Improvement for four, Max Desirability for three more, and then we go right into that replicate. I ran replicate a couple of times just out of curiosity. We started with five initial runs, and again, we're trying to shift an entire process space into an unknown territory.
What I want to encourage you to think about is how would you approach this in your day to day? What amount of effort, time, and resources would that take? And then acknowledge that it only took 11 runs to get there. This has such a profound ability to save time, save money, and save resources, again, when it's deployed in a scenario where it makes the most sense.
What makes the most sense? Well, sometimes it's just a matter of trying it out. Sometimes it's a matter of sitting down with colleagues and having a discussion. Sometimes you just got to jump into the deep end of the pool. All right, so let's get back to our slides. I'll kick it over to Sarah.
All right. Thanks, Peter. A few of the benefits of Bayesian optimization for semiconductor manufacturing, just to cap that off. Again, you can have smaller starting designs, and you can stop when your goals are met. You don't need a fixed upfront sample size. You just iterate until you finally hit those response goals, so there's no wasted information or there's no collection of unnecessary data. So less waste, less cost.
There's also an ease of adoption compared to many of the classical statistical methods that we use for process optimization. It's statistically simpler in a sense that there's no p-values to evaluate, there's no normality test to check, there's no residual plots to look at. While it might be computationally complex with the construction of the Gaussian process models, from an interpretation standpoint, it is definitely less statistical machinery to worry about.
Also, there's no need to worry about doing effect selection to build your model. If you're used to building standard least scores regression models, you'll know that you to decide if you need to model interactions or potentially any type of polynomial curvature that might be present. There's no need to do that here because Gaussian process models naturally learn the response surface shape as you iterate.
Gaussian process models, if you're not familiar with those, it's definitely worth 5 to 10 minutes of your time to just read up on them a little bit. They're a very flexible model fitting technique. It's a machine learning kernel based method. If you have complex response surface shapes, Gaussian process models can learn those in a very flexible and efficient way as opposed to something like a standard least squares regression modeling. Lastly, it's a guided iterative approach which aligns very naturally with how a semiconductor process development engineer typically thinks.
Just to compare some common process development strategies that are currently employed in semiconductor, we've obviously got our ad hoc approaches, and this is your trial and error combined with maybe some tribal knowledge or some gut instinct. The problem here is that this inevitably leads you into that trap of one factor at a time experimentation, which is probably the least efficient way to gather information about a process. Also, uncertainty is not quantified in this approach.
Then we also have our classical design of experiments approach, where perhaps you consult with a process engineer, maybe you start with a screening design to screen an initial set of lots of factors, and then once you have an idea of which knobs are going to be your big knobs, you might augment that design to resolve any interaction effects, maybe model some curvature. Then you'll fit your final response surface model for doing your actual optimization.
I want to make sure to point out, classical DOE has been employed as a process development approach for decades now, and it has been very successful, and we do not expect that to change anytime soon. That decades of industrial application of DOE will obviously continue to be employed.
But some of the challenges of the classical DOE approach that Bayesian optimization can help solve are the fact that you often need large upfront sample size, so you might need a large initial set of DOE runs to establish that model. There's a heavy focus on statistical issues as opposed to focusing on your actual problem-solving.
It's driven by the factor space and not the actual response values and your business goals. It can often place your design points at extremes and at the centers of ranges while the optimal region of your design space or your process space might go completely undetected.
Bayesian optimization is not a replacement for a classical DOE approach. It's simply another tool in the toolbox that can help us deal with difficult situations like some of the situations that Peter described previously on when to use Bayesian optimization.
We want to wrap by just highlighting, reiterating what is the value of Bayesian optimization. I think first and foremost, and this fits so well into the semiconductor environment, is it's a very iterative approach. You set up your goals, you get some initial data, whether it's from a basic design of experiment or from historical data, and then you can just move iteratively and let the process guide the process.
Let the results guide the next run, and then it's going to tell you when to stop. Not only does it account for the goals, it uses those desirability functions in the profiler to quickly hone in on what is the best candidate factor set, but then it says, you are finished. There's great value in being told, I'm done, and I can move on. It is such a great tool for reducing time, reducing waste, and reducing expense.
I can't tell you how many times I talk to my customers in the semiconductor environment, and they're like, "I can only get so many wafers to run this, and I can't fit my design to the number of wafers I have." This is where Bayesian optimization can truly benefit those challenges, which are ubiquitous across the industry. It easily accommodates current common semiconductor processes and allows those users to innovate their techniques and processes in a very natural way.
What I love about it most is that it is so computationally complex, but because it's baked right into that profiler, it's so easy to digest those results and interpret and really understand why am I moving in this direction? Based off of those results, how close do we think we're getting? We're space filling, we're now refining, and now we're confirming and challenge. This is very nice gentle breeze pushing you through your semiconductor process.
It's a very powerful tool now available in JMP 19, and I think it's going to really be, as Sarah said, a really fantastic tool in that toolbox to take advantage of.
This very well may have left you with more questions than answers. We've really just scraped the surface of what this platform can do. We wanted to point you to a few resources. You can pull out your phone and scan this QR code, and it's going to take you directly to this recording of an ENBIS workshop that Chris Gotwalt and Phil Kay gave on this topic a little while ago. Also, if you are looking at these slides, you can click on these images below, and it's going to take you right to some PDF content.
Really, we need to give a huge thanks, particularly to Chris Gotwalt and his team. They worked tirelessly to develop this, test it, and really refine that. Our hope is that you take advantage of this tool, and we can hear back where are your successes, where are the shortcomings, and what would be the next best improvements for you. Thank you for your time. We hope you're having a great discovery, and we appreciate you being here today. Thank you.
Presenters
Skill level
- Beginner
- Intermediate
- Advanced