Choose Language Hide Translation Bar

Assessment of Dielectric Reliability in Semiconductor Manufacturing

Reliability assessment of devices and interconnects in semiconductor technologies is typically done for technology certification and periodic monitoring, using relatively small (single digit to tens) sample sizes per condition. Volume manufacturing data can be used over time to assess dielectric reliability by ramped voltage-breakdown measurements on scribe-lane test structures. Over time, this can provide a detailed view of dielectric behavior, including a mixture of intrinsic and extrinsic mechanisms affecting dielectric integrity. In particular, low failure-rate outliers or tails can be detected and addressed, which may otherwise pose field-quality risks.

For practical reasons, the ramp may be stopped at a target voltage to reduce test time and avoid damage to probe cards and needles, which may result in a small number of data points being censored. Fitting large data sets with a small number of censored data points can lead to convergence challenges, resulting in incorrect fitting parameters and lack of confidence intervals, as well as posing significant computational challenges.

This work explores these challenges with the JMP Life Distribution platform and examines alternatives and solutions to allow correct analysis, fitting, and extrapolation.

 

 

Hello, JMP Community. My name is Mehul Shroff. I'm with NXP Semiconductors in Austin, Texas. Along with Don McCormack from JMP, we are going to present a study on the Assessment of Dielectric Reliability in Semiconductor Manufacturing.

To introduce the topic, this is a cartoon of a basic semiconductor transistor known as a MOSFET. Within the structure, we have a gate dielectric that is used to insulate the gate from the channel, and this is what acts as the switching element in the transistor. The gate controls the flow of charge carriers in the channel. The integrity of the gate dielectric is very crucial for device reliability in the field. One way we can look at this is through high volume manufacturing ramped-voltage breakdown studies, measurements that can help us detect drifts and defectivity issues in the gate dielectric. This is typically done on scribe-line test structures, where we do have smaller areas, but over time we can get larger sample sizes. In the semiconductor world, we think of field quality in terms of the well-known Bathtub Curve, where the observed failure rate is seen to initially decrease with time then increase later in time. This is due to the contributions of a few different groups of mechanisms.

The first, known as Extrinsic Reliability mechanisms, mainly deal with early failures such as latent defects and infant mortality. We can reduce this quite a bit through various screens and tests that we do before shipping the parts out. The next one comprises of Constant or Random Failures such as soft errors, latch-up, ESD. Then the last one comprises of intrinsic reliability mechanisms which focus on the wear-out of the device, including dielectrics, that occur over time and increase with time. When we collect dielectric breakdown data, we typically would like to represent it by a Weibull because that's what the ideal distribution would fit. But in practice, our distributions are often mixtures, as is shown this data set, where we can have an extrinsic tail due to defects in local thinning, then most of our data, hopefully, is intrinsic or natural breakdown. But that can be convoluted by process variations such as those shown here in this wafer map where the gate dielectric thickness varies over the wafer, causing differences in the intrinsic breakdown and the curvature that we see on the high side.

Then some sites don't fail within our test time or max voltage.

To avoid probe needle damage and keep the test time manageable, we stop the testing, and therefore, that then becomes our sense of data. Now, why do we need to fit this? Mainly because we want to be able to monitor drifts over time that result in changes of the parameters. We want to understand the impact of process changes or improvements, and we want to be to project out to low PPM behavior for high reliability applications such as automotive. But what we see is that when we try to fit these mixed distributions, we often run into problems where we can't converge and therefore don't get confidence in tools as shown here, where we get the nominal fit, but not the confidence in tools. This can be due to a combination of large data sets, imperfect or mixed distributions, and censoring. Here's an example of the same data where we have a distribution with some censoring, and we can't converge. Here, if we treat the censored data as failing data, which is clearly incorrect, but just for the purposes of illustration, we are able to get it to converge.

But we see that the fit itself is slightly worse than what we had before and doesn't quite fit the intrinsic distribution where we'd expect it to be. But this is just an example to show that the presence of censored data points a role in our ability to fit this data.

Here's an example where we had a process improvement, where we had a breakdown between the gate and the source train of the transistor, resulting in a pretty severe extrinsic tail. Through a series of process improvements, we substantially cleaned it up. We see here that in one data set, we had some censored points, and we were unable to converge. The other data set, we didn't have any censored points, and we were able to converge fine. Even though we can tell that, visually, we have an improvement, we are unable to judge statistically what the improvement is. We can see the same thing here in our distributions where the mean distribution changes by a little bit, not a whole lot, but we can see that the minima at each site has significantly improved if you look at the scale and the color gradient.

Here we took the same data as before, and we see we have the same convergence problem. Here we have a different data set where we also had two versions of the process, one slightly better than the other, though both showed significant tailing. However, there was a difference in the intrinsic fits. In this case, both had censored data, and yet they both converged. To test whether this was related to the size of the data set, we took this data set, inflated every I think by 10x to maintain the same distribution, and compared this in this analysis. Even though the number of censored points and the total number of data points increased, we were still able to converge. Based on this, we were able to rule out the size of the data set as a factor. We thought that this was driven by the Weibull scale and shape factors.

At this point, I'd like to hand off to my co-author, Don, to take it from here on. Please go ahead, Don. Thank you.

Thanks, Mehul. Let me get into slide mode and we will start talking. Even though we're talking in the context of semiconductor manufacturing, this really applies to any case where we're collecting large amounts of data and there can be various factors that are impacting the failure mechanisms. Like I said, just because we're talking about semiconductor data doesn't mean it only applies to semiconductor data.

Let's talk a little bit about what happens with the Life Distribution platform, because that's what we're using primarily. Just to look under the hood, get a little bit of an understanding of what's going on and what might be causing these poor fits and these non-convergence issues. Let's start off by saying that what we're trying to do with Life Distribution is we're trying to optimize a function, specifically, it's the likelihood function that we're looking at. The important thing about the likelihood is that what it is going to give me is the most likely or the highest probability parameter estimates given the data. That's the important aspect because this approach is data-driven. The other wonderful thing about likelihood functions and the Weibull distribution is that because there's only two parameters in the Weibull distribution, the alpha and the beta, this is really easy problem to look at visually.

If you take a look at the graph on the left, that's just the likelihood function plotted in three dimensions where I've got beta and alpha on my X and Y. What we're trying to do is we're trying to find the minimum point, the minimum of that graph. Looking at the graph, you could tell that regardless of where you start on the graph, wherever the starting values in the parameter space, you're pretty quickly going to converge at that minimum point. When the data is behaved well, things go quickly, things go really, really smoothly. Now, question is, what went wrong or what happened to cause the non-convergence issue that we saw in the data that Mehul was talking about. In most cases, a lot of this can be traced back to observations in the data set that are just not representative of the principle underlying model. Now, obviously, the first thing that a lot of people think about are outliers, so I can have outliers in the data that are causing problems.

The data can come from more than one distribution. I could have multiple failure mechanisms, and they can be part of the data set.

To be fair, to be honest, and to be complete, obviously, the underlying model could be wrong. For example, maybe the Weibull wasn't the best distribution. We're going to assume that because there's a fairly decent device physics understanding that the Weibull distribution really is the best distribution to use in this case, we're going to discount that very last bullet. What are some possible... Knowing that we have some of our data that's not representative, what can we do in the context of some of the JMP platforms to help us fit a better model or maybe diagnose our problems? What is to change the algorithm or to change the parameter values? Now, this is not directly available in Life Distribution, but there's another wonderful platform within JMP, Nonlinear, that allows me to do this. A little bit more flexibility than Life Distribution. Sometimes when I run into these problems, I can use Nonlinear. I can obviously find and remove those values that non-representative of the main underlying model. Lots of tools in JMP to allow us to do that.

I'm going to show you a tool you might not be familiar with, and that is Fit Parametric Survival. Using the data filter to be able to find problem observations in the data.

Then finally, we could find and build a more representative model. This obviously could be done using the Nonlinear platform. However, we're going to use a much easier part of the Life Distribution platform, the Fit Mixture, to be able to do that. Again, just a real easy, simple recap of the tools. We're going to be looking at Nonlinear, the Nonlinear platform. We're going to be looking at Fit Mixtures under the Life Distribution platform, and then we're going to look also at Fit Parametric Survival.

Let me go to the examples. I'm going to start with a real simple example. I've simulated data from a Weibull distribution, so nothing fancy here, very straightforward. However, what I'm going to do is I am going to add an outlier to the data. Here's my simulated data with the outlier. So 700,000 observations were the bulk of the data. One observation was the outlier, so this guy right out here. Now, you might say it's inconsequential, easy to visualize that, pull it out of the data. However, this does cause a problem with the Life Distribution platform.

If I were to run that with the Life Distribution platform, you'll notice that I have non-convergence issues. It's all due to this one outlier right here at the right-hand tail. What can I do in those cases? As it turns out, Life Distribution gets me most of the way there. I can use the Nonlinear platform to get me the rest the way there. Let's open up Nonlinear. The one difficulty with Nonlinear is that you have to know the form of the Log Likelihood function. There's a lot of that available in the literature, Certainly, if you know anyone with a statistical background that can help you come up with these distributions. I've got my Log Likelihood in one of my columns, my data set right here. I'm going to use the Nonlinear platform. What I'm going to do is I'm actually going to start with the parameter estimates that I got from Life Distribution. I'm going to start with a beta of about 35, and I'm going to start with my alpha of about 5.6, and I'm just going to click Go.

As it turns out, with that amount of control, I am able to converge and come up with parameter estimates.

If I were to actually calculate the log likelihood, I see that really Life Distribution came close. It just didn't get that last step along the way. I use Nonlinear platform to allow me to do that. Now, that said, there still can be issues. Here I have another set of data. Let me show you the graph of it first. This looks, again, almost identical to that first example. The only difference here is that I push that outlier further to the right. Obviously, if I were to try to I get this in Life Distribution, I get the same problems I saw with that first set of data. If I were to pull this up into Nonlinear and run my convergence, I get a warning message saying it converged, but I've got one loss formula result that had missing value. If I were to go and save these estimates, and what I will see if I go under that column with my second example, the two observations, two of those observations, two of my intervals, you can't calculate the log likelihood.

This is the problem that's occurring with the Life Distribution platform is that when I have these observations that are sitting way out on the right tail, it's breaking the log likelihood function.

I don't have time to go into a whole lot of detail as to when exactly this happens. In our slide deck, we have one slide in there that shows you how to calculate where those values might be. But this is what's happening when you've got those outliers that sit far off in the tail.

Let's move on to the second example. Here we've got something slightly different. Rather than having just one observation that's sitting out on the tail, let's say we've got really two sets of data. We've got two failure mechanisms. I've got my main failure mechanism that's simulated from the same set of data. Then I've got this secondary failure mechanism that's about 10% of the observations that I see in the original data set. Rather than separating them out cleanly, I get data that looks like this, and I'm just going to change my side-by-side bar graph to a stacked bar graph. In essence, what I'm looking at is a slightly fatter distribution, a little bit fatter than what I would expect to see from a Weibull. Let's say I didn't know that by just looking at the data and I go into the Life Distribution platform.

As it turns out, I can fit a Weibull, a single Weibull of this data. But looking at my comparison distribution plot, there's pretty severe model misfit. Even though I had convergence, I've got some pretty bad model misfit. Here's a case where, let's say I have enough process understanding to know that I've got multiple failure mechanisms. What I can do is I can go in there, into the Life Distribution platform, and I could say, get a mixture. Here I'm going to start with two Weibulls. I can specify whether or not they're on top of one another, whether they're slightly separated or whether They're completely separated. I'll just assume that they're right on top of one another, and it fits a reasonably good model. As a matter of fact, let me do one other thing, too. I'm going to change this axis so that it is on the Weibulls scale. There we go. You could do that with any graph within JMP.

You'll see that this is a considerably better model fit. Matter of fact, if I go back up to the top, I've got my model comparison. I noticed that I do a much better job fitting that second Weibull distribution in my data.

Like with the first method, there are things that will cause this approach to break, and I've created a data set to illustrate when you might start to worry and what you might be able to do. In this case, I have got a second distribution. Again, looks very much like that first distribution, and let me put them side by side, so we can compare them. The only difference is that, one, I have observations, just like in the outlier case where I pushed them further out in the tail. Secondly, not quite as obvious from the graphs, in that second example, I have considerably fewer observations in that upper tail, about one 100th number of observations. I think I used about 70,000 observations in that first simulated second data set, and only about, I think, a thousand observations, or maybe a little bit more than a thousand observations. So considerably smaller data set. If I were to go and fit that using Life Distribution, actually the fit looks better. However, I still have my non-convergence issues.

Additionally, when I try to fit my mixtures, in this case, I fit my two Weibulls, I have non-convergence issues.

I could certainly try different sets, different combinations of observations, of distributions. I'd be hunting and pecking in that case. Really, the problem here is that I've got these observations. I've got a small set of data. It's pushed way out on the tail. There's got to be better approaches to be able to model this. As it turns out, because this data is separated, if we take a look at the observations in this region of our space, there are no observations. It might be safe to think that these observations might come from one of my distributions and the rest of the observations The rest of the observations come from my other distribution. I can do that. I can model these two different distributions differently. It's pretty straightforward. All I do is I exclude and hide the part of the data set that I want to, and then I just run the like distribution, and I've actually preset that up. You see, as it turns out, in this case, and in this case, I have fit...

Let me re-do that. I need to actually delete or hide and exclude all of my observations.

There we go. Hide and exclude. Let's do that one more time and run that. One more time. Something is happening with my script, so we're going to do this manually. We're going to hide and exclude. We are going to go under my Life Distribution platform, and we're going to be putting in our Example 3 frequencies, our lower and our upper points, and we are going to fit the Weibull distribution. As it turns out, we see that this does, in fact, fit data when I don't include both sets of data. At this point, I would probably go back. I would probably flip the observations and refit that other part of the data. When there is clear separation between the sets of data, then I can just fit separate models for each. Let's go on to our third and final example. That is the data that Mehul was nice enough to share with me in terms of being legitimate semiconductor data that is real life, all the warps, warts, and bunts that data usually has.

Now, if you look at this, this looks very different than what we've seen in the past.

I not only have observations in the right tail, but I've got observations in the left tail as well. I don't have that clear separation that I in the other data sets. Now, granted, I can make arbitrary decisions in terms of let's fit Life Distribution to those observations, maybe a couple observations up here and then the middle of the observations. But that sometimes that's not really an ideal approach. I see right off the bat, I tried to set up the Nonlinear platform to fit a mixture model. You'll notice that there's quite a few of the intervals that I can't calculate the likelihood for. So very problematic set of data. Aside from trying to fit separate Life Distributions to parts of the data set, another option would be if I were to have access to a richer data set, more information about where this data came from, I might be able to diagnose where the problem was. Now, keep in mind that here we're looking at probably about two months worth of data, I believe.

It comes from multiple lots, multiple wafers, hundreds of thousands of observations in this data set. It's likely to have multiple sources of failure mechanisms in here.

The question is, are there certain parts of that data that are really causing the problem with this set of data? What I've done, and let me go to the data set right now. Again, I want to thank Mehul for sharing this information, and I've really only taken a subset of the data because of time constraints in terms of how long it takes to fit. But this is the original data. There's two different versions. I have four different lots in here. Each lot has anywhere from a few to about a dozen wafers. The question is, where are the problems coming from? I am going to use a platform that hopefully you're familiar with. If not, it's under the Analyze menu, under Reliability and Survival, Fit Parametric Survival. If you are familiar with Fit Model, it looks very similar to Fit Model. I can set it up like light distribution where I have a lower and upper when I have interval censoring. Different than fit model, I get to specify factors not only for my location, so factors that will influence where the center of the data is, but also factors that will influence how spread out the data is, the scale.

I have set this up already. I'm just going to keep this very simple model and run Fit Parametric Survival. When I do that, you'll notice right off the bat, I have all of these missing values. That's because I've got non-convergence. What's nice about this particular platform is that I can use the data filter to try to diagnose that problem. What I'm going to do is I'm going to go into my data filter, and I've got this set up as a conditional data filter, and I click on the new, and you'll notice that with all of the new lots, I get estimates. It's only the old lots that are causing me the problem. Let me see if I can dig a little bit deeper. If I look at triple A Lots, that's a problem. Now I know that it's somewhere in this lot. I'm going to go to wafer one. That looks good. Wafer two, three. Here's one of my problems.

Something is going on with wafer four. At this point, I could probably pull this out. I could probably take a look at it using all the tools that we've seen so far.

I might have to go back into my data store and figure out, is there something else with the way this lot was processed that caused it to be different? But this is definitely a different lot. If I look through the rest of my wafers, they're all okay, except for this very last wafer. Wafer 15 is also an issue. But the whole beauty of this approach is that I've got a diagnostic tool I would be able to look more deeply into the data if I had that richer data source.

Let's wrap things up. In summary, large scale reliability data is common in the semiconductor industry. It's a valuable tool for determining drifts, particularly in the case when I've got a high quality products, very low-ppm defectivity. As we've seen, distributions are very complex, often including multiple failure mechanisms and multiple distributions. Because of this, we need to rely on some slightly more novel fitting approaches to be able to not only fit models, fit reasonable models to this data, but to potentially diagnose problems that exist in the data. As always, and probably most importantly, subject matter expertise is incredibly important in terms of saying, historically and process knowledge tells us this is the best direction that we should head into.

Thank you for your time, and I hope you got something out of this talk. Thanks.