Choose Language Hide Translation Bar

Revolutionizing Semiconductor Manufacturing Tests with Predictive Modeling

The semiconductor manufacturing industry stands on the brink of a transformative era, powered by advanced analytical techniques. This presentation delves into the application of predictive modeling and diagnostic analysis within JMP software to significantly enhance manufacturing outcomes, particularly during the crucial early sort and class test phases. By leveraging comprehensive parametric data collected across various stages of the semiconductor production process, we embark on a journey to refine the prediction of unit-level pass/fail outcomes and unearth the underlying causes of potential defects.

Our study highlights the strategic use of JMP’s predictive modeling capabilities to accurately forecast the final system-level test status of semiconductor products. This approach not only allows for early detection of issues but also facilitates the implementation of corrective measures in a timely manner, thus ensuring higher yield rates and superior product quality. In parallel, diagnostic analysis within JMP offers a deep dive into the data, enabling manufacturers to identify and address root causes of failures across the intricate web of production processes.

This presentation showcases real-world applications of these JMP features, demonstrating their pivotal role in streamlining semiconductor manufacturing workflows. See how predictive modeling and diagnostic analysis can be effectively employed to optimize production outcomes, reduce costs, and enhance product reliability. Join us in exploring the cutting-edge analytical strategies that promise to redefine the future of semiconductor manufacturing.

 

 

Good morning everyone. My name is John Xu, and my colleague name is Suraj Sindia. Both of us are from Intel Corporation. As you know, Intel is a semiconductor manufacturing company. We make the computer chip and other chips too. Today, we would like to present to you one of the research we have done. The title for in this day that Revolutionizing Semiconductor Manufacturing Test with JMP.

We apply some AI methodology inside JMP to help us do the predictive modeling for test steering. For our prediction, we have four sections. First one, briefly we give you introduction about what the semiconductor manufacturing industry look like, and then we will describe what the predictive modeling method we used in the JMP with the neural network and the boosting.

Then we show you the JMP demo use case study, adapting testing for the manufacturing test in JMPs. We go through the JMP application and use the JMP neural network and PCA. Then we have summary and conclusions.

Overview of this semiconductor. It may be critical and rapidly evolving sector that lies at the heart of modern technologies. Maybe semiconductors actually somehow we call ICs, are essential components found in the vast array of electronic devices like smartphones, computers, automobiles, and many others like your home refrigerator, everything.

This industry is characterized by its high level of innovation and significant capital expenditure, maybe one fab maybe spend $10 billion to build one fab, and the silicon market demand, we just say, at least some of the market segments related to semiconductors, like a memory, like a charm, fast memory, or microprocessor.

The more popular everyone knew that maybe the CPU in the computer. Commodity Integrated Circuit like standard circuit used in the multiple device types, many different, like automobile or some others, and ASICs like Application Specific Integrated Circuit, and maybe some of the analog strip to like in the traffic light and some others.

Then also I give you very brief high level what the semiconductor manufacturing model look like. We can say maybe only three types of company we call the IDM or Fabless company or Pure-play Foundries. The Fabless company just do the design and make the design for the chip, but they don't make the chip by themselves. We call those company Fabless, like AMD, Nvidia, Qualcomm.

Also, some companies, they don't have own product. They don't have own design IC, but they build IC for other Fabless company. The few [inaudible 00:03:37] are TSMCs or [inaudible 00:03:39] Foundry. But some company they do everything together like Intel or Samsung.

Basically we have our own design for the chip, then we make the chip ourself, build our own fab. Then we also assembly and make the final chip out and ship it to our customers like HP or Dell, and they will make the computer for final customers. Also on a high level, we give you the high level semiconductor manufacturing process.

Actually semiconductor manufacturing process involves many hundreds of different steps. But this is a very high level, basically starting from the silicon wafers. So actually like Seasonic company they make the silicon and cut the silicon into the wafers, then ship referred to Intel. Then we do many steps to make the chip out from wafers.

First step is that oxidization. Basically that growing a layer of the silicon dioxide on the wafer surface. Then we do the photolithography, actually forming the circuit design onto the wafer using the light sensitive chemical. Then we blow by etching and removing the unwanted material to create the desired patterns. Then do the doping, actually introducing the impurity to modify the electronic property of the silicon, then deposition.

Actually, we are adding layer of the material to the wafer through various techniques. So maybe layer on top another layer on top. After that one we do the Electronic Die Sorting and Testing. But today the research will focus on this process we call the testings.

Next slide give you more detail on how we do the testing process.

Thank you, John. Thank you, John. Thanks for the overview of the semiconductor manufacturing process. Of the semiconductor manufacturing process like John demonstrated, there are several stages. Testing is the principal stage of the process where the technical goodness of the die is verified. It is the stage where all the preceding manufacturing steps are tested, whether they are meeting the requirements for which the integrated circuit or the die was designed.

Within the testing phases, there are three principal stages or phases of semiconductor testing. The first is the Electrical Die Test. This is where the dies are tested on the wafer without even having the dies diced. You have wafer probes like what is shown here in this little pic, probing the internal nodes on the die for measurements such as voltages, currents drawn from the supply, leakage currents, and so on.

Units that pass the Electrical Die Sort Test are then diced up from the wafer and packaged, and the packaged units look or resemble something that is probably familiar to most of us who have seen a motherboard on a computer, those black widgets, the integrated circuits.

These have been packaged now, and they are tested for functionality as well as performance, using millions of test patterns. If a unit fails even one of these test patterns, they are discarded. If it fails the performance or doesn't meet the performance threshold of one bin. It is then down binned and tested for subpar performance. And if it doesn't meet that, it is further tested for a lower bin and so on. If it doesn't meet the lowest performance bin, then it is discarded and so on.

This is the second principle stage in the manufacturing test flow is called Class Test or Package Test. Subsequent to the package test stage we have a socket called Test Steering. This is stage where an offline tool steers the parts into one of three or more lanes to subject the Die into what is called as Final System Level Test.

This is the test socket or the most expensive test socket where the dice that have now been packaged are mounted on a motherboard and tested for high level, application-like tests.

These are things like Windows OS booting or Linux booting, high level application programs such as Chrome browser as well as even JMP-like applications run on the part after having it mounted on a motherboard. This is to make it resemble as closely as possible to its end use like an end user of a computer would. If the die part... If the part passes this system level test the final system level test, then it is shipped off to Intel's customers or the semiconductor manufacturers customers.

Why do we have VLSI testing or integrated circuit testing? The need of testing is crucially important for functional assurance. Every die that is manufactured inevitably has a defect potentially landing on it. Every incremental manufacturing stage has the potential for introducing defects, and subsequent to these manufacturing stages, you need the testing stage to weed out these defective parts.

The defective parts, in many cases, do not conform the die to the functionality that it was designed for, and in some cases, it doesn't match the performance requirement that the die was designed for.

The part where testing plays a crucial role is not only in functional assurance, but also in performance assurance. Subsequent to this type of defect detection and functional assurance, it also helps in yield learning, meaning improvement of the manufacturing flow such that any incremental defect sites that have been detected through the testing process can now be used for improving the manufacturing process to make changes in photolithography, for example, or in etching, for example, to control the manufacturing flow such that defects do not occur.

This is extremely important to maximize the top line revenue and also cost savings for the bottom line, in the sense that every incremental percentage of yield improvement leads to substantial cost savings for the manufacturer.

What are the challenges in semiconductor testing or VLSI testing? VLSI stands for Very Large Scale Integrated Circuit, and the reason for that is the integration of transistors in the modern microprocessors exceed as many as 10 billion individual transistors.

On the right-hand side, I have a little clip art of a modern day microprocessor, where the scale of the individual transistors can be as small as a single digit nanometer. That's like fitting a thousand transistors or a thousand parts of a single strand of your hair.

The high cost and time consumption of testing results from having to test each individual transistor. You can imagine the number of test patterns needed to individually test for every transistor on such a design. This results in large volumes of test data and analyzing this test data for diagnosing where the individual transistors that are defective happen to be, as well as to process this huge data for verification of good or bad, and performance characterization of whether the die is conforming to the highest bin, or it needs to be bracketed into one of the lower bins before it is shipped to our customers.

The role of predictive modeling in VLSI testing can now be envisioned, based on the lead up that you saw with... You have the potential for immense cost savings, and you have the potential for gleaning useful actionable insights from a huge data set. This predictive modeling techniques can be used for early detection of defects by identifying potential issues with very few test patterns.

It can be used for significant cost and test time savings by predicting which dies are likely to pass or fail with a small number of tests, as opposed to waiting until the last stage of test. Predictive modeling can also be used for eliminating certain test stages completely, like what we will do in this case study, where we eliminate the Test Steering stage, where we have a tester dedicated to partitioning dies into one of many lanes where it is subjected to a final level system test.

Predictive modeling can be actionably and intentionally used to eliminate certain test stages altogether where there is a high correlation. The improvements that are possible from the standpoint of yield learning and quality enhancement have a huge potential with the predictive modeling. So in our case study, we will now examine how we can eliminate a test stage called the Test Steering stage using JMP and a neural network based model trained using JMP.

If you recall, in our three different test stages from Sort to Class, Class to SLT, there is an intermediate test hearing stage where an offline measurement is made to steer the parts into one of many final system level test lanes.

In this case study, we have three test lanes where the test time durations of the test lanes, bracketed as TT1 through TT3, TT1 being the longest test lane for the units that have the best performance and the shortest test lane for units that have the worst performance. What we are trying to do here is use predictive modeling through JMP to steer individual dies into one of these test lanes, instead of the XC2 measurements that are made by an offline measurement tool.

We are trying to use the inline measurements that have already been made from the Sort and Class Test stages to predict which test lane the individual die has to be subjected to. In our data set, we have 13,186 tested units. Of these we have of these 13,186 units, each unit has 16 sensor signals that have been measured in situ. Five of these signals are categorical and 11 of them are numerical inputs, and the outputs are one of three categorical values, TT1 through TT3.

As you can see, the distribution is fairly skewed, with TT3 dominating among the units and TT1 being the fewest, about 564 through 10,488.

The X1 through X5 are categorical inputs. They span several categorical ranges, from as few as two levels to as many as 22 levels between X1 through X5. Among the numeric inputs X6 through X16, we have a fairly diverse spread. As you can see with X6, we have a range as small as between 0.01 through as high as 1.5. In the case of X14, we have the biggest spread from as low as zero through to 2,500 plus. So we have a fairly diverse spread of inputs.

We will examine building a neural network based baseline model in JMP to predict one of these three categorical values using the 16 input signals that we have collected in situ. To start our model building exercise in jump, we will start with the simplest possible neural network model that can still give us some useful insight into the classification accuracy.

We'll start with a single-layer neural network with three neurons. We will have a 5-fold cross-validated. We will set the learning rate to 0.1. We will have no boosting, which means the number of additional models will set to is zero. We will have number of tours set to one, which implies we do not have any incremental back propagation training post convergence.

For every individual neuron we will set the activation function to be Tanh hyperbolic. Now we will get into JMP to do an active demonstration of the model building process. I have my data table here. We will now, okay I will share my data table here.

Okay, I have my data table here. What you see here is the inputs that I have from X1-X16 and the test time plane, which is, one of three, TT1-TT3. The three values that you see. Now we will start with the predictive modeling. Within the Analyze dropdown, you have the predictive modeling, and we start the neural network training.

For our X factors meaning the inputs we will use X1-X16. For the Y Response we will use TT Lane. Hit OK. That brings up the neural network training menu. Within this we will use a KFold cross-validation. We will like I pointed out use the number of folds to be five.

For the random seed, we will use a fancy number that I had used for the slide, so the numbers are consistent. For the first layer we will have three neurons. It's the baseline model. We will not have any neurons in the second stage. We will use Tanh hyperbolic activation function.

It's basically a sigmoid function. We will have no boosting. We'll set the number of models to be zero and the learning rate to be 0.1. Now we hit Go to start the training process. Now the model is getting built and trained, and lo and behold it's a simple model. We got the model built quickly. What you see here is a fairly decent generalized R-Square. Even with just three neurons for this data set of 13,000 units.

Since we used KFold, set to five, that means out of the entire data set, 80% of the data is used for training and 20% of it is used for validation. Within this, what you will notice is with the R-Square being as high as 92.9%, 93% respectively for training and validation data sets. We have some misclassification. If you look at the confusion matrix, this is where you see the actual TT Lanes, meaning the ground truth on the y-axis and the predicted states on the x-axis.

We have misclassification of about 20%, which is the highest misclassification rate, in this data set, where we have 20% of the times where TT1 is predicted as TT2 by the neural network model. Now we can also plot for our purposes, a 15% or lower misclassification rate is ideal, but 20% or lower is acceptable. We will demonstrate how we can continue to improve the misclassification rate.

Now we plot the neural network diagram here. As you can see, we have all 16 inputs. The three neurons that we designated for the first layer. Each of these neurons have the Tanh hyperbolic activation function and TT Lane as the output. Now we shall increase the model complexity. As I said, this is the baseline model. Now let us complexify or complicate the model such that we can improve the misclassification rate.

We had three neurons in the first stage, zero neurons in the second stage. Now we will increase it to two neurons. We still retain no boosting and keep the same learning rate. Learning rate can be modulated, changed depending on how fast you want the convergence at the expense of accuracy.

If you increase the learning rate, you potentially can settle down to local minima instead of global minima. Your convergence will be faster, but you risk the potential for settling in a local minima as opposed to a global minima. We can change different penalty methods. We will go with squared minimum mean squared error, basically. Number of tours set to one.

As our model is getting built, you will see that our R-Squared has improved already between the baseline model, if you recall, was 0.92 and 0.93, and our model with three by two R-Squared has improved. Our misclassification rate also has improved between 0.2-0.18 basically 18%. We can look at how the diagram looks like. We have two stages here three neurons in the first stage and two neurons in the second stage.

Now let us proceed and increase the model complexity some more to examine how this will change our misclassification rate. One thing you have to be careful about increasing model complexity is you don't run the risk of overfitting to the data such that it loses generality. But at the levels of 13,000 units, we shall examine how many parameters that results in.

As you will notice, the training period exceeds as we increase the number of neurons in our model being built. Here we have the generalized R-Square for the new increased complexity of the model. As you can see not always the accuracy improves with simple increase in the model parameters. Because of what I mentioned the risk of overfitting.

As you can see the R-Squared is 0.937 and 0.936, slightly over the three by two network, but not a whole lot. As you will also see, the misclassification rate in fact has slightly degraded with the increased complexity. As you can see, if we plot the diagram, we have as many as five neurons in the first stage and seven neurons in the second stage.

Not always model complexity increase results in a better misclassification rate. This is where you will use intelligent tuning techniques such as boosting and the combination of model parameters and boosting to increase the accuracy. Now we shall examine, given that we have a certain accuracy that we have achieved and misclassification rate that we have achieved, how can we pare down the input complexity?

We have we are using 16 inputs. In our case, how can we reduce the number of inputs to retain the same level of accuracy? As I noted here, we increase the model complexity but resulted in a slightly less accurate misclassification. How can we improve the overfit nature of the model by reducing the number of inputs we are using? For that, that's where JMP comes in handy, and we can use the multivariate methods of principal components available in JMP to help us build a model with fewer number of inputs.

We will now select the principal components from the input data set that we have, and pick the necessary number of principal components that result in the least number of errors. As you will notice, we have X1-X16 as the inputs. We are only selecting the numeric inputs because for principal component analysis, the only inputs that JMP can analyze are the numerics.

As you can see, we have the 11 principal components of the data set. But each incremental principal component results in less input variance captured. We can plot the scree plot here to see where the knee of the PCA happens to be, and as you can see, the knee occurs at about, two principal components. Essentially the substantial majority of the variance is captured by just two principal components.

But let's also plot and see what the incremental CDF of the eigenvalues happens to be. As you can see, we have about 90% of the input variance captured by just six eigenvectors or six principal components. We will now use these six principal components and supplement that with the categorical inputs for our neural network training. If you recall, when we launched the principal component analysis tool, JMP will not take the categorical inputs for the PCA, only the numeric inputs.

For our neural network, we will have to supplement it with the categorical and build our network with the principal components that we have chosen. Now we come back to the data table that we had. I have saved the principal components from this analysis. We will pick the principal components, and, now build our model neural network again.

I have the six principal components that I had generated. I save them for inputs, and then I supplement that with X1-X5 as the other inputs. I still want TT Lane to be my Y Response or output. I use the same KFold cross-validation and the random seed. We use for the first layer with three neurons and two neurons on the second layer.

The number of incremental models that we have, we will go with zero meaning without any boosting initially. Then we will examine with boosting. Now we hit Go. You'll notice I'm using the same number of folds, meaning 20% of the data for validation and 80% for training.

Here we are. Ready? We notice that we have achieved about the same level of accuracy as using all 16. First thing to check is we only have 11 inputs now, the six principal components and the five categories. We have the neural network, the three by two neural network. The misclassification rate is respectable. It's about the same as what we got with all 11 inputs and the more complicated five by seven network.

How do we go from here? Now, let us increase the number of neurons in the model and examine how that will help us with our misclassification rate. As the model gets trained with the increase in the model complexity, it takes longer. You will notice when I introduce boosting. It will take even longer for the training across the day.

You'll see now with the seven by five network, our accuracy has gone up and that is because we pared down the inputs. It is counterintuitive. Reducing the inputs can sometimes increase the accuracy because of the selection of the model complexity.

That's the non-trivial or the insight that is non-trivial to glean from this. Now if we go further from here and examine the... Let's take a look at the diagram. As I said, we have the six principal components and the five categorical inputs. The model complexity has gone up seven by five network.

Now, let us do boosting. Let us aid in boosting. One thing to keep in mind about JMP is boosting only works on the first layer, meaning JMP ignores any number of neurons specified in the second layer. Only the first layer neurons are used for boosting.

We will now have no neurons in the first layer, and we will have a boosting rate of five and have the first layer neurons to be three. Boosting implies the error between the prediction after convergence and the inputs. The error rate between the target prediction and the ground truth is minimized with every incremental boosting of the model. We have a total number of 15 neurons in this case.

We will now go from here, and demonstrate boosting. The model gets scaled by the number of models you have selected here. In this case five that is used as a scaling factor to scale the number of neurons in the first stage. In our resulting model, we will have 15 neurons which is a 5-fold cross-validated against the training data set. We'll see now the fit model. Then we will wrap up our presentation.

As I had mentioned with boosting the training takes longer. Now, as you can see, because we have only a single-layer with boosting, our R-Squared went lower compared to the five by seven network, but it is better than the baseline model that we had initially built. The baseline model versus the 5-fold cross-validated is what you see here.

As you can see and the boosted model can be seen here. This is the boosted model against the baseline model. The accuracy is also can be seen here. Very comparable between the boosted model and the baseline model.

Now we wrap up the presentation. Let's go back to our slides. Okay. We're back to our slides. As you can see, we trained the baseline model. Then we went ahead and reduced the input complexity through PCA. Well, we selected the six principal components, supplemented it with the categorical X1-X5, and we trained our neural network.

Input. Paring down the inputs resulted in an improved accuracy. That is the key insight where you have an increase in model complexity, not necessarily improving the model accuracy. But paring down the input complexity can aid in the accuracy improvement. Then we had the scree plot we demonstrated with the scree plot happens to be, and we chose the six neurons or the six basis vectors for our principal components for training.

This is another which we skipped today, where we have only the categorical inputs. As you can see, the accuracy substantially dropped, which means you cannot pare down the input complexity endlessly that will result in a less accurate model.

Then we chose a boosting stage of five. We have, in this case, a boosting of two with three neurons. As you can see, when we increase the boosting from 2-5, we have an improvement in the misclassification rate and the R-Square.

We can compare across such networks. In this case, while we built many models, I'm comparing just three principal models that we ended up using ultimately. For us, just a simple two-layer neural network with a boosting of five with six principal components and five inputs, met the accuracy threshold that we were looking for.

In each case, you will have to examine what the accuracy requirements for your application happens to be. Then choose the model judiciously, after experimenting with both the training parameters, model complexity and input complexity. For our presentation today, we examined how the neural network model can be used in JMP for our targeted application, namely adaptive test steering for integrated circuits.

We examined different predictive models. We did a simple neural network, a neural network with a principal component. We examined neural network with principal components and boosting. We demonstrated the improvements in terms of the additional testing resources needed.

We in this case were able to eliminate an XC2 measurement tool, which can amount to as much as $100,000 per machine by using the in-C2 measurements that were already made in the certain class test stages to predict the system level test lane. Such opportunistic cost reduction in testing is possible through simple two-layer neural networks.

One thing I must mention is in JMP, within the user interface, only a two-layer neural network can be trained, but you can expand on the neural network complexity using JSL where you can customize and build a deeper neural network, as you may need for higher end applications with larger data sets. That is our presentation today. Thank you for listening.