Advanced Decisions Using Predictive Modeling in Semiconductor Industry
In a semiconductor manufacturing operation, it is critical to maintain product yield, cycle time, and quality due to its costly and complex processes. Increasing unit probe (UP) bin fallouts for a qualified device raised delivery concerns and required risk mitigation. Management frequently asks for an estimate of the potential scrap risk from below minimum yields and/or exceeded statistical bin limits. Planning groups need to know if new wafer lot(s) are needed to backfill the die shortage, while product groups are interested in knowing the yield impact at unit probe. As a device owner, I used data mining techniques to identify the shifts of the parametric probe parameters that correspond to UP fallout and made advanced decisions based on predictive models.
In this demonstration, unit probe and class probe data was extracted and cleaned up from the production database. The collected data was split into training, validation, and test data sets. Four models were evaluated and analyzed at wafer level. Classification decision tree models and bootstrap forest classification models were chosen for categorical response analysis (scrap or not), while linear regression models and bootstrap forest regression models were compared for continuous response analysis (UP bin fallout %). Model screening helped reduce parameters from hundreds to a handful. The resulting final decision was made on the wafers' disposition based on the constructed predictive model.
When the UP response was available from the test data set, the accuracy of the two prediction models were compared between the predicted and probed bin fallouts.
My name is Su-Heng Lin. I'm from NXP Semiconductors. As a device owner in a Semiconductor Manufacturing Operation, it is critical to maintain product yield, cycle time, and quality due to its costly and complex processes. Any yield issue can risk delivery concerned and require risk mitigation. In this case study, I will demonstrate how to use JMP, predictive modeling platform, to make advanced decision in semiconductor manufacturing operations.
Here is an example of Semiconductor Wafer FAB process flow. We start with 25 wafer silicon wafers in the wafer-lot. All 25 wafers will go through various process to grow dielectrics or conducting a material. Then you will be patented to make into integrated circuit street* within a die. In this case study, the technology involved more than 300 process steps. It take about 9 weeks of the cycle time to finish the wafer, and it will be patented, so we can continue and send it down.
But before the wafer, it goes to the oversea for the circuitry testing, you need to pass the wafer acceptance test, also known Class Probe here. Those Class Probe will test individual device to ensure the health of each of the wafers. If the wafer are healthy and send it on, in this case, oversea, then it will receive the unit probe test. At this time, the test is performed on each of the individual dice. The entire integrated circuitry will be tested.
All dice are tested, and they are either categorized into a passing bin, or it will be in one of the various failure bin. The white one is a good passing beam here, and you can see some different color here for the fail bin. Besides tracking the yield for my part, part A here, and we also track the bin performance. In this case, I'm going to show you BIN#6. It is one of the bin that we had some problem recently.
This is the train chart for the normalized BIN#6 for all. You can see that this statistic limit, we calculated based on historical data. This line represented if any wafer, which is each of the data on this chart, if any of the wafer exceed this open six criteria, it will be a wafer scrapped. It doesn't mean it's a zero yield to be scrapped. It will be a scrapped. You can see that recently in the yellow zone, time zone, we have a lot that suffers many wafer being scrapped because the elevated BIN#6 issue.
There's some other lot, it looks okay, it didn't exceed the limit, but it's also elevated. The management come in and say, "Are we okay? Do we know the health of our wafer for part A?" In order to answer the question, I have to make a few decisions. First of all, that I have 150 wafers from six different lot. They are at end-of-line Class Probe. They have the Class Probe data, but do we want to send the lot on to oversee for the unit probe? Are they going to suffer the BIN#6 scrap? That's first thing I need to know, and I need to make a decision. Should I send it on? Are they at high risk or not?
Secondly, if they do have suffered the yield loss, but not to the script level, how many wafer are impact? How many dice are we going to lose? Am I going to cause the customer lying down because we cannot deliver the dice, the wavers on time? Very last, as a device owner, I need to know what may be shift in line. Although it didn't fail for Class Probe, but maybe they are interaction between the Class Probe parameters. They are responsible for the elevated BIN#6 fallout. I need to know. I can give the recommendation for the in-line processes engineers, anything that we can do to help maintain the good quality of the dye of the wafers.
What I need here is, I need a predictive model to help me with the risk assessment in this situation. The first step of building a predictive model is to define my responses and predictors. I'm going to use the Class Probe parameter that we collect from each wafer. Because we collect more than one site, I'm going to use the wafer median for those Class Probe. We have total 173 continuous factor here. They have some autocorrelation among those 173 factors because a transistor, we are going to extract the threshold voltage, I set drive current. There's some autocorrelation among those 173 parameters.
For the response, obviously, we are going to use BIN#6 fallout here. We have total of more than 2,000 wafer data in my database. I'm going to use all of them. The normalized BIN#6 fallout is the continuous response I'm going to use. You can see that this is the histogram of my normalized BIN#6 fallout pattern. This is the statistical bin limit. Anything above, I scrapped.
But I want to have a tighter criteria during this data mining process. I tighten the criteria to 90%-tile of the BIN#6 distribution, which equal to 0.187 in this case. Anything that fail or less than 0.187 are defined as good wafers. If it's above, it's a bad wafer, even though they may not to be scrapped. This is how I define the categorical response in this predictive models. By doing this, I will have around 200 parameters. They are my good parameters, and I have 1,800, they are my good wafers.
We defined the parameters, the response, and the predictors in the previous slide. I'm going to introduce you four simple model technique in JMP, predictive modeling platform, that can help me to do a quick and accurate prediction to make a good decision. For the categorical response, good versus bad in six, I'm going to use the Decision Tree and the Bootstrap Forest. The Bootstrap Forest will again use to predict the regression response, which is my normalized BIN#6 fallout. I'm going to compare to a fit model that is the Stepwise Linear Regression.
After the demonstration of four of these models, I will show you that how do we pick which model to use. I will compare the R² to decide what is the model I'm going to use. Also, the benefit of using the predictive model is, it's going to hopefully, give me, what are the key class pro-parameters responsible for my BIN#6 fallout?
The first step to do the data mining in the predictor modeling is, I have 2,000 dataset. I'm going to divide it into training dataset versus the validation dataset. We can do that under the JMP predictor modeling. We have to follow the honest assessment recommendation. It's good that the JMP default, which I'm going to show you the JMP. Yes, we can go to analyze, Predict Modeling, and we go to the make validation column.
One thing I want to make sure is, since I have 25 wafer per lot, now, they go through the process together. I want to make sure that I have a representative for each of the lot going to the training versus validation dataset. I was certified using the source lot, which is my wafer lot ID, then I will say Ok. Then you can see the default in JMP is already followed the honest assessment recommendation, 75% for training dataset, and 25% for the validation dataset. I'm not going to run this here because I already created the column here, validation. This is how you can generate your validation column, and we are going to use it for the model that we are going to demonstrate.
Let's go back here. Remember that I mentioned, I have 150 wafer in the Class Probe, but I don't have the Unit Probe feedback. I don't know what is the BIN#6 fallout, and I want to know. Even I put in my test dataset using those 150 wafer at Class Probe, they still follow the honest assessment group, so I'm good to go. First I'm going to show you the classification response using the Decision Tree. Let's go back to JMP. This is my dataset. It is under Analyze Predictive Modeling. We are going to be here a couple of times today, and it's under the petition. Let's click on the petition, and you can see that, yes, I have the Decision Tree here.
This is the method I'm going to use for my predictive model. I put the validation column I created here. I use the BIN#6 class, which is good versus bad. You can see here, this is my BIN#6 class. I have my 173 Class Probe data here. I hid it in this data table, but you can see they are 173 here. Let's go click Ok. Yes, this is my Decision Tree. I already pre-colored the bad red and the good blue. I can show you manually how to do the split first. The JMP will do the split based on what make most significant difference between the good versus bad. Here, it shows the resistance from RSP+. If it's higher than 148, I have this much, much more bad wafer than when we have the resistance less than 148. This is the first split. JMP fund.
I can continue doing the second one. It found the ToxP, if is less than 145, wow, I have much more bad wafer that can be with high, normalized BIN#6 fallout, with a combination of the higher resistance and the lower sickness. I can keep doing that mainly, or I can do go, and then we can get the JMP to help me to do the split. It stopped after the sixth split.
Let's look at the R². The R² for the training is about 67%, and for the validation is about 60%, not bad. There's something here in the red triangle next to the petition for BIN#6 class. I would like to see the column contribution. This is the one that will help me to learn what are the parameter that JMP summarized for me. Besides the two, the ISP+ and ToxP, JMP also found two other parameter, but with the last contribution to the BIN#6 fallout. We will close this Window and minimize my JMP and go back to here. This is the two thing I recorded, the R² for this model and the column contribution here.
Let's move on to the next model. I'm going to show you the same classification response, but I'm going to use another partition model that is called Bootstrap Forest. Let's go to my JMP. Yes. Let's go to Analyze, Predict Modeling, and the Bootstrap Forest is just below the position here. Validation, we'll go to validation again. I'm going to do the categorical response, good versus bad here. My predictor on Class Probe is in the X factors.
Let's say, OK. JMP default, I'm going to just make it simple. We follow the recommendation from JMP. It's going to build 100 tree in the forest. Let's see. It will summarize the average of the variability. It is more stable than the Decision Tree, which is only one tree. Now, we have 100 tree average out, and we can find the R² here. Generalize R², it tells me that, "Wow, my R² can be high up to 90% for my training dataset, and it's 80% for my validation." You can see that within the Bootstrap Forest, it did improve the R² for both the training and the validation dataset.
We can also go to the column contribution. We have the resisters and the oxide thickness highlight again. But besides the total four parameter, the Bootstrap Forest, because it runs 100 trees, it give us much more parameter, but with very minimal impact to the normalized BIN#6, good versus bad. I'm going to keep this model because it performed better than the Decision Tree. I will minimize it for now, and let's go back to my presentation here. We have very good R², and we have the two highlight parameter.
Let's move on to the continuous response. It is the regression model I'm going to demonstrate here. We will use the Stepwise Fit Linear Regression. This is my data table. Go to Analyze Fit Model. Instead of predict modeling, I'm going to use the fit model. It's another way to build a model. There's no place for me to put the validation because I'm going to use all the 2,000+ data altogether to make the correlation. We say we want to do the regression model. I'm going to use the continuous normalized BIN#6 fallout in my Y, I'm going to pick the Class Probe 173 parameter as the main effect to my model effect.
You can do the macro to degree for two. Then now you are going to see the interaction between any of the two factors. But it's going to take a long time, so I'm not going to run it here. I'm going just to use the first order. Since I have so many predictors, I like to use stepwise. It's more robust. Run the models. Again, I'm going to follow the JMP recommendation. I'm going to use the minimum BIC as my stopping rule, and the stepwise direction will be forward. We will keep putting in parameter until we have a good model.
You will run the model. Take a few second. Yes. After it's done, you can see that I got R² and R² adjust here. It's around 74%, not bad. Those two numbers are very close. Let's look at our model. This is my model for the linear regression. It did give me the ISP+. Now, highlight it, but the oxide thickness, the other one, it is a little bit low. I do have a lot of other parameters show some strong correlation to my BIN#6 contributed to the normalized BIN#6 fallout. This is because that I have a large dataset, more than 2,000 wafers, so it's easy to find correlation. We will stop here and then go back to my PowerPoint.
This is the R², and this is the effect of summary that what parameter are important in this Stepwise Fit Linear Regression model. The next model I'm going to demonstrate, we go back to Bootstrap Forest. This is a very good precision model because it can also handle regression pretty well. Let's go to my JMP file again. Analyze Predictive Model and Bootstrap Forest.
If you have used JMP before, you know that we can recall the setting we have. But I'm going to replace the categorical BIN#6 fallout, with my normalized BIN#6 fallout, which is continuous factors. I will, again, follow JMP recommendation to use 100 tree in my forest. It will take a few second, but not too long. Let's see. The Bootstrap Forest for normalized BIN#6, it give me… Wow, my R² is around 0.977, 98% and for my training dataset. For the validation dataset, I got 91%. This is all performing the other model I just show you, the linear regression. I'm going to keep this one, and let's take a look before we move on.
Column contributors. We have the same two resistors and oxide thickness, they're highlighted. We have another two that is high, but a lot of other parameter with a small contribution to the model. In this case, if you want to build a reduced model, you can highlight the two parameter here, and you go to Analyze, Fit Model, and then you do Macro, and I like to do the response service. Why? Because I capture the main effect, and I also capture the interaction and the quadratic effect.
I can put my normalized BIN#6 to Y, and because I don't have a lot of predictors here, I will use a standardized least square and run. This is the model you can get, and then you can reduce from here. This is a reduced model which only give me 65% of the R². It is good, but it can be more stable because you don't involve so many predictors. Easier to explain the model, but the bias will be higher because it doesn't have all the other parameter that highlight by the Bootstrap Forest model.
Let's go back to my presentation. Now, we've finished all four models, and let's look at the R² comparison. It is not very fair to compare the Stepwise Linear Regression because it's not using the same platform, and it's not the training versus for validation. It's R² versus R² adjust. But you can see that my Bootstrap Forest for both the classification data and for the regression data, response are very high, higher than 90% for the training. It's also the highest for the validation dataset. I decided to use both for my good versus bad prediction and also for my estimation for the BIN#6 fallout.
Then here, let's see, how do I estimate it for it? I go back here, and I remember this is the one for my Bootstrap Forest for the class data, good versus bad. In order to use the predictive model, I'm going to the red triangle and save columns. I will save the prediction formula. You can see that quickly, the JMP give me the probability for bad versus good. Then you will say, most likely BIN#6 class here.
If you go to the very bottom of my dataset, then you can see that I have those 150 wafer that I hide it in the data set when I built a model. But the equation that are generated by the model, apply to those 150 wafers, and it predict, "Okay, I will have one bad wafer from this lot, Lot-D, wafer one, and the others. Lot-C has a lot of bad wafers, and we have lot A with some bad wafers." It's a very quick way to tell me that how many wafer are at risk. But this one only give me good versus bad. I'm going to see that if I know the…
What is the estimated BIN#6 fallout? If I can get that number, it's even better to help me to make a correct decision. This is not the one I want. It's this one. I want this one. That is the Bootstrap Forest for the normalized BIN#6 fallout, the regression model. Very similarly, I'm going to show you a red triangle and save the prediction formula. It will give me the normalized predictor here. Let me minimize that one. For each of the 150 wafer that is under test in the validation column, I got a predictive number for the BIN#6 fallout. This is how you use the predictive modeling in JMP to give you the estimated number to assess the risk of those 150 wafers.
At the end, we know that it's about 47 wafer will be bad, which means elevated BIN#6 fallout. But none of them will trigger the scrap limit. We decided, "Okay, we will send down all those 150 wafer to Uniprobe, 4 weeks later, we will get a Uniprobe data, and we will review how good is our model." The decision we have been made is, we send out the wafer because we don't need to… Now, the wafer will be a scrap risk. Also, we don't need to start a new lot because the overall die impact from the estimated BIN#6 fallout, is not enough to warrant a new part start as a big up.
I also learned that which two parameter are key for responsible for the BIN#6 fallout. After a talk to our subject-matter expert, and team decided we can work on the resistance that is high. We can do a process windowing and see if we can lower the resistance and keep up the yield for the part A. The other parameter, ToxP, it's a batch process. You can process with different part under the same technology. It is not suitable for making process change. The team decided that we are going to tighten the control limit in line to ensure we have a stable performance for this ToxP.
This is 4 weeks later, we got the Uniprobe feedback from these 150 wafers. You can see that, yes, we do have a two-lot that is higher than four for the major BIN#6 percentage. This is the predicted percentage of the BIN#6 fallout. The correlation is about 71%. It is not bad. It does help me to make the right decision. Now, I've then exceed the 0.187 criteria, not at high risk. But you can see, obviously, one lot, the lot-C behave odd. We review the lab processing history, and we realized that Lot-C is not a standard product. It was used for a process change evaluation. We have to exclude it from the modeling. With that, we can improve our R² up to 80%.
In summary, I'm looking for a predictive modeling to help me to make a good decision quickly, not using the DOE that take months to get the Lot. This model need to handle the autocorrelation without a lot of data manipulation or limited because of the computer memory constraint. We talk about the pre-partition and the Bootstrap Forest for the categorical data, they are good to handle large predictors, large number of predictors. Also, it doesn't care about the with autocorrelated predictors either.
We talk about the Bootstrap Forest, which is outperformed the linear regression as well. You can handle both the classification and the regression data, and very good with large predictor, and also good with autocorrelated predictors. I hope this has been helpful and will help you to try your first predictive modeling using JMP software, so you can make an educated decision based on data. This is the end of my presentation. Thank you for your attention.