Choose Language Hide Translation Bar

Double the Pleasure, Double the Fun! Reliability Under Two Failure Modes

Life got you down? Do you have two failure modes and you're not sure how to make reliability predictions? There is a path to success! Using a straightforward method, an Arrhenius data set of transistor lifetimes with two independent, lognormal failure mechanisms are modeled in JMP. The upper confidence bound on the probability of failure at use conditions is also estimated. 

But what about future testing? You may need to test similar parts in the next qualification. How should you design your life tests when there is more than one failure mode? Again, there is a solution!  A graphical method for planning life tests with two independent, lognormal failure mechanisms is demonstrated. Reliability estimates from simulated bimodal data are shown with the contour profiler, helping you navigate this difficulty. This simple graphical approach allows the practitioner to choose test conditions that have the best chance of meeting the desired reliability goal.

 

 

Hi, my name is Charlie Whitman. I'm the JMP Systems Engineer for the New York and New Jersey regions. For most of my career, I was a Reliability Engineer in the semiconductor industry. Today, I'm going to talk to you about a subject that is near and dear to my heart, and that is reliability. We're going to be talking about reliability prediction under two failure mechanisms.

I want to start with where we're headed, where are we going? Back in college, I had a mechanical behavior professor, and before class one day, he showed us a specimen that had broken in test. He asked the class, "Why did this thing fail?" He answered his own question. He said, "Too much stress." That's what we're going to be talking about today. We're going to be talking about different types of stress and how they can cause failure.

As we know, there can be multiple failure modes operating on our parts that can cause failure. Again, my background is in semiconductors. This is an example of some ICs on a circuit board, and they got too hot, and they got too hot, and they failed. Another example is here, corrosion. For example, maybe there was a little bit of water ingress into the package, and that caused some corrosion, and then the parts fail. Another possibility is ESD. Perhaps somebody touched this circuit board and they weren't grounded, and they caused an ESD event. But the bottom line is that there are multiple mechanisms out there that can cause failure. We need to have a way of dealing with that and modeling that so we can make predictions.

How do we handle that? The literature is full of examples of how to model this. Typically, we are going to assume that these failure mechanisms operate independently. You can think of the mechanisms as all competing with one another to cause failure. When they act independently, that means they're not looking over their shoulder to see what the other mechanisms are doing and then making an adjustment. They just stay in their lane, and they do the best they can to cause failure first.

How do we do the analysis then? We do the analysis... If we want to analyze the results from, say, one particular failure mode, if we assume independence, we can treat all the other failure times for other modes as censored.

What does censored mean when I talk about censoring? Here's an event plot demonstrating what this means. Let's suppose I have three parts on test, and part number one is going along for, say, 10 hours, and after 10 hours, it fails. Then part number two is going along, and it goes along fine for a little bit longer and then at 15 hours, it fails. Then there's part number three. At part number three, we get up to 20 hours, and maybe we have to stop the test or something, but it doesn't fail. Now, it would have failed if we had kept going, but we didn't keep going. We had to stop the test. It turns out there is information in that. We don't want to just exclude this data point or pretend it failed at a different time. There is a little bit of math behind this, but JMP does all the heavy lifting for you. If you have a censored data point, you should use it, and it can help you make your estimates.

In the talk I'm going to give today here, I'm going to assume that we have two active failure modes, and they're both independent, and they both have log-normal failure times. As a reliability engineer, we have to come up with a figure of merit, something we care about. When I was in the industry, a common figure of merit was the median time to failure. That's the time for 50% of your population to fail.

I was never a big fan of using the median time to failure. Imagine if you have parts in the field and you have a 50% failure rate, that's pretty bad. You want to know the time to something much smaller, maybe time to 10% failure or the time to 1% failure. I think an even better metric than that is to use the probability of failure. I want to be able to make a claim like, "I think that the field failure rate is going to be a half percent in 5 years," something like that.

Now, there's an old expression in statistics, and that is, "Nothing lies like an average." If I take an average of a sample, I have to understand there is some uncertainty in that estimate, and I acknowledge it using confidence bounds. The same is going to be true if I make a prediction of what the field failure rate is. If I say the field failure rate is 0.5%, 0.5% plus or minus what? That's where confidence bounds come into play. I'm going to use the upper confidence bound on the probability of failure as my metric because this way it's going to give me an estimate. It's a little bit more conservative, but it's more conservative in a rational way. I'm not making some arbitrary choice of what my upper bound is or what my metric is. I'm going to use the upper 95% confidence bound on the probability of failure.

I'm going to give a little background here just so we're all on the same page. I want to define a few things. I'm going to talk about F, and F is the probability of failure, and R is its complement. R is 1 minus F, and that's just the reliability. I'm also going to be discussing two failure modes, A and B. These are arbitrary, so I'm just giving them a label, A and B. It doesn't matter what they are. Now, if I have both modes operating and both can cause failure, I'm going to have an overall probability of failure. I'm going to call that Ftot. That's for when both mechanisms are operating. And so naturally, I'm going to have an Rtot as well, 1 minus Ftot.

The key to analyzing the data here is that we're going to use an approach that mimics that of two independent components in series. Let's suppose I have two components. I could have two circuits or two engines or whatever, and they're operating in series. What that means is if one fails, the entire system is said to fail. In that case, if I can assume independence, the overall reliability of my system is equal to the product of the individual reliabilities. Since I know that R is equal to 1 minus F, I can back out what the overall probability of failure is for both mechanisms, and that's given by this expression here.

I'm going to make heavy use of the Arrhenius model here, and this is the Arrhenius model. Basically, the median time to failure is proportional to the exponent of inverse temperature. If you look at this, I can take the logs of both sides of that expression, and this looks suspiciously like a point-slope. Here is a constant, and here is the slope, and here is my Y. If I were to plot the log of the median time to failure versus 1 over kT, the slope of that line would be the activation energy, and the intercept term here would be logC.

I'm going to be talking about an actual data set of actual lifetimes for something called a GaN FET. GaN stands for Gallium Nitride, and FET stands for Field Effect Transistor. Basically, there were some Field Effect Transistors or FETs on life test at high temperature, and I have some failure times for those, and I did an analysis for two failure modes. But I wanted to show a little bit about how this works.

The way it works is I apply a voltage between the source and the drain. Then a current will flow between the source and the drain, and that current is controlled by the gate. I can apply a voltage to the gate, and I can either shut that current off or I can let it flow. As we know, nothing lasts forever. What this means is that I can apply a voltage between source and drain and a constant voltage on the gate, but over time, this current might degrade. It'll change, and I don't want it to change. I want it to be constant, but it starts to vary.

Another possibility is that instead of current going from the source to the train where I want it to go, it goes from the source up to the gate. Now I have gate leakage, and that's something, again, I don't want. Another possibility is that the voltage which will shut off the current here, the voltage I need on the gate to do that, can become unstable, and it can vary. That means, again, my device won't be functioning properly. I have all these possible mechanisms which can be causing failure. I'm just going to be assuming that there are two, and I'm just going to give them the label here of A and B.

Also, I want to talk a little bit about accelerated life testing. Supposedly, we have products that are in the field, and we'd like those products to last a long time. We'd like them to last years. But if you want to prove in the reliability for a new product, you can't afford to wait years and years to test. You have to get the testing done much more quickly than that. That's why we use accelerated testing. The idea is that we're going to up the stress. We increase the stress on the part, we run it under some high stress condition, and that makes the clock run faster. That makes it so that the failure mechanisms aren't changing. We're not introducing anything new. Everything just happens more quickly.

There are various stressors that we can use. We can use temperature. Back when I was in semiconductors, we used voltage very often. You can change the environment. I can make it a humid environment or a dry environment. We're working to focus on the temperature. I want to show you here an Arrhenius plot. This is the failure time. This is on a log scale, and I plotted my failure times versus 1 over kT. Remember, this is inverse temperature. Over here on the left side of the plot is actually high temperature, and on the right side of the plot is low temperature. I run my life tests at different high temperatures, and I have my failure times, and I can fit a line to that. The slope of this line is my activation energy.

I'm going to take this data, and then I'm going to have to do something terrible. I'm going to have to extrapolate. We've all been told we shouldn't extrapolate. But in accelerated testing, there's no choice. We have to extrapolate because look at these failure times. It could be on the order of a million hours or something, and we can't wait that long. We do this extrapolation, and when we have the extrapolation, we can then get, say, a median time to failure, or also we can know the distribution of failure times at the use condition. When we know that, we can calculate things like time to 1% failure or the probability of failure or what have you.

I want to give a little heads-up and show you where we're headed. Here are some case study results. I have this data which I obtained from a customer, and I analyzed the two modes, mode A and mode B. Here I'm just looking at the profiler. Here I have the distribution profiler for mode A. Again, I found this by treating all the mode B failure times as censored. That way I'm analyzing only mode A.

In the Profiler, I have my two factors. I have my temperature of the device and the time of interest. Let's say there's an industry standard, and the industry standard is I need to know the reliability at 150 degrees C. Also, we want to know how long it's going to last after 5 years, or what the probability of failure is after 5 years. 43,830 hours is 5 years. Using the Profiler, I see that the probability of failure is about 5 times 10 to minus 22, and the upper bound there is about 0.001. Then I can do the same thing for mode B. Here, mode B, I'm going to treat all of the mode A failures as censored, and I have only mode B failures.

When I do that, again, under the same conditions, my probability of failure is very small, 4 times 10 minus 89, and my upper bound is about 1 times 10 minus 17. Compared to mode A, mode A had an upper bound more like 1 times 10 minus 3. Mode B looks like it's very, very small, but probability of failure due to mode B is much, much smaller than mode A. It looks like mode A is really dominating at lower temperature or at use conditions. I'm going to show some more of this in a little bit.

Then after I talk about this case study, I'm going to change gears a little bit. I'm going to talk about how we plan a life test. Let's suppose you have a part that you're going to test, but you know that you have more than one failure mode, and you want to plan around that. What temperatures do I use? And what happens if there's a big difference in activation areas between these modes and things like that? How many temperatures do I need? Well, I boil it down to contour plot.

In the contour plot here, I have all these input factors. These are all my planning values. I have a... I have z_A and z_B. These are Z Scores for my two different failure modes, and I'll show how to calculate that later. I have other planning values like, what is my... How many parts per temperature do I have? How many temperatures do I use? Things like that. What I get out of it is this contour plot, and this is basically a response surface.

I have my two factors, z_A and z_B, and as they increase, the upper confidence bound on the probability of failure goes up, up, up. So everywhere on this contour, here, all those values of z_A and z_B, the upper confidence bound is 10 to the minus 2. Here, for all these values of z_A and z_B, again, the upper confidence bound is now 10 to the minus 3, et cetera. We're going to be using this contour plot in a little bit, and I'll show you some more.

Let's get into the case study. Again, I obtained this data from a customer. There were 48 GaN FETs tested over three temperatures, 320, 337, and 355 degrees C. The test was run for a good long time, but after about 2,100 hours, the test was stopped. This is just a summary table showing what was done. Here the DUT is the Device Under Test. These are the temperatures that it was run at. Then I had different part numbers or different numbers of parts at each temperature. Notice that the distribution is not equal, I have fewer parts at the highest temperature compared to the lower temperatures.

That's actually a good idea because remember, what we're going to do is we're going to extrapolate. We're going to extrapolate to lower test conditions. When we extrapolate, what we want to do is we want to make sure that those confidence bounds on whatever estimate that we come up with is as narrow as possible. It turns out that if you pile up your parts and put more at the lower temperatures, that confidence bound is going to be a little bit more narrow, and that's good. That's why it was done this way.

Then looking at the failures here, we see that we had a lot more mode A failures than we had mode B failures. But clearly, we're getting more mode B failures as the temperature increases. Maybe the trend is not quite so strong with mode A, but definitely for mode B, we can see we're getting more failures than for mode B at higher temperature.

What I did was I took the life data and I did an analysis and I got out the activation energies, and I was able to create this Arrhenius plot. Again, I have my failure time on the log scale versus 1 over kT. The slope of this red line here is the activation energy for mode B, and the blue line is the activation energy for mode A.

If we look at this, you can think about extending these curves out. If I had for mode B, if I extended this out to higher and higher temperature, you see the failure times here for mode A, we'd expect them to be longer than that for mode B. What that means is that mode B would dominate at higher temperature, and that's what we see. We're seeing mode B get stronger and stronger as the temperature goes up. By the same token, if I tried to extrapolate here for mode B to lower temperature, the failure times here for mode B would be very high, much higher than we would get for mode A. The parts would fail for mode A first, and mode A would dominate at lower temperature. Again, that's what we're observing.

I took the life data... This is just an example. This is what the data looks like. For example, I had a part here after it was run at 320 degrees C. It failed after about 261 hours, and it failed due to mode A. Also, I have a column here, a censoring column, which tells JMP, is this failed or is this censored? Again, JMP is going to do the heavy lifting and do the analysis for us behind the scenes. All we have to do is tell it which has failed and which is censored.

I checked the data. I used the Fit Life by X platform to make predictions, and I did this one at a time. I have data for mode A only, and again, all the mode B failure times were censored. When I did that, I got my failure times here. I have my distribution at the three different temperatures. JMP produces this.

You can see that the spacing here between, say, 190 and 170, that's 20 degrees, is different than the spacing here between, say, 290 and 270. That's because we're doing a transformation. This is actually transformed from 1 over kT. That means the slope of this line is still my activation energy, and I can do my extrapolation to use conditions. My use condition is about, say, 150 degrees C. When I do this, I see that my median time to failure is maybe a little bit over 1 times 10 to 11th hours. Also, I see that I have a distribution. I know the distribution of failure times here, so I can calculate things like time to 1% failure or what's the probability of failure in 5 years or something like that.

Once again, we have our distribution profiler, and so I can put in different values if I want to. I can move these things around to see what the probability of failure would be under different test conditions. Also, I'm getting my Arrhenius parameters here. JMP calls beta-naught the intercept term. That was my log C in my presentation. Here, the intercept term is right around -40. JMP calls the activation energy beta 1, and that's about 2.4 EV, and then there's the shape factor, and the shape factor here is about 1.6.

Basically, from these Arrhenius parameters, you can predict the log mean, and the log mean here is a function of temperature T. Then when you know the shape factor, you can completely describe this distribution, that's why we're able to predict what the probability of failure is at these conditions or any temperature we care about.

Now I can also do the same thing for mode B. I analyze the data for mode B. Here, again, I treated all the mode A failures as censored. Now, if you remember, I had far fewer failures due to mode B. You can see that from this key here, the triangle means that it is censored, and I have a lot of censored data points here. What I can do here is, again, I have my failures, even though If I were to get this censored, I can still do the extrapolation, and I can do that to use conditions. The median time of failure here is more like, say, 10 to the 25 hours. That's much, much higher than for mode A. Again, we would expect mode A to dominate at lower temperature.

Then we also have our distribution profiler. This also tells us something. We're analyzing these two failure modes separately. What that also means is that if mode A were completely eliminated. I changed my design, I changed how I process things, and I'm able to eliminate mode A, all I'm left with is mode B, and mode B would produce much more... If only mode B were active, the parts would be much more reliable. Now I can play games. I can say, maybe I have a spec, and my spec is I want to make sure that the upper bound is no worse than 10 to the minus 3 in 5 years. Right now it's 10 to the minus 17, so I can use the profiler to say, can I survive at 200 degrees?

Well, now the upper bound is four times 10 to the minus eight. I can go up a little higher, maybe 225. Now I'm a little higher, it's more like three times 10 to minus five. I can go even a little higher if I want to, 240. Now we're getting something close to our spec of 0.001. This is actually good news. It helps us point us in a direction for where we need to go for our reliability program and where we're going to get the most bang for our buck. This also helps us open up the spec.

Maybe the customer would like to operate the part at a higher temperature. So now we can say, "Yeah, sure, go ahead. You don't have to keep it at 150. You can operate much higher if you want to, and you're still okay."

Also, JMP return my Arrhenius parameters as well. I have my intercept term was very different, minus 100 or so. The activation energy here, you can see, it's close to 6 ED, and it was about 2.4 ED, I think, for Mode A. That's very different. That's why the slope, again, here, the slope in my Arrhenius plot for mode B was much steeper. Also, I get the shape factor. The shape factor was larger. Again, with these parameters here, I can predict what the probability of failure is at time and temperature of interest.

Let's just summarize what we went over here. This is a summary table. I analyze the data for mode A and mode B, and I got upper confidence bounds on those. It turns out that JMP does not return an upper confidence bound on the overall probability, which is what I was after. It uses a walled technique to get the upper confidence bound for each mode. But unfortunately, it's difficult to apply the walled technique when I have two modes operating, so I used a different method, and when I did that, I got a slightly different answer. It's about the same as we would expect.

Since mode A is dominated, we would expect that if both modes are operating, it's still going to be mostly mode A, and we would expect upper bound to be something around one times 10 to minus 3, and we're getting something around that ballpark, so it looks like this is working correctly.

Let's move on, and we're going to talk about planning a life test. We're going to change gears here. For many years, there's a lot of this stuff in literature and in textbooks on how to plan a life test for the Arrhenius case, where I'm testing over temperature. But less has been published when there is more than one more present. Of those publications, I think most of them are highly mathematical. It's fine as far as it goes, but I like a graphical approach. I think a graphical approach is more intuitive.

What I did was I simulated a whole bunch of failure times under different test conditions, and I took those results, and then I calculated the upper confidence bound on the probability of failure, and then I modeled that, so I could predict what the upper confidence bound would be for given test conditions or different assumptions for planning values, number of temperatures, things like that. I want to give a little background here, so you all understand what we're talking about.

Again, I'm going to assume I have two failure modes, A and B. These are generic, so I'm just calling them A and B. Each mode is going to have its own Z-score, z_A and z_B. I'll show you how to calculate the Z-score in a second. I did this to simplify my life. I didn't want to have so many free parameters like activation energies and shape factors and all these things. I found that if I could boil everything down to a Z-score, I didn't have quite so many things to vary, but I could still back out what the activation energy was, et cetera. That would be very helpful if I didn't have to simulate so many different values.

Also, there were other inputs as well, things like number of parts per temperature, number of stress temperatures, and things like that. Here's how we calculate Z-score. Typically, we have some an industry standard. We're interested in the time to failure after 5 years or 10 years or something like that. We also have an industry standard maybe for the use conditions. We want to be able to tell the customer that they can use it at 150 degrees C or 100 degrees C, something like that. These are known. Also, we're going to have to input some planning values.

We're going to have to guess, we're going to have to talk to subject-matter experts, something like that, to get estimates or an idea what our Arrhenius parameters are. Then we just plug those values into our formula, and we're going to get a Z-value, and that's going to be our Z-score. Since we have two failure modes and they each have their own Arrhenius parameters, I'm going to have two Z-scores, one for mode A and one for mode B.

This is just a summary table showing I varied all these parameters over a very wide range. I varied z_A and z_B over a wide range. I used a wide range for the use temperature, the number of temperatures, et cetera. I also wanted to boil things down. I did not want to have a huge matrix of lots of different possible stress temperatures. I made my life easy by boiling it down to just three metrics. For example, if I tell you what the lowest stress temperature was, T1, and I tell you that I had four temperatures, and I tell you what the spacing between those temperatures was, then you know everything. Then you can calculate what the four stress temperatures were.

Rather than having a whole huge matrix of all these possibly different stress temperatures, I brought it down to just three parameters: the lowest stress temperature, the number of temperatures and the spacing between those temperatures. Again, I would vary the sample size here, I vary the activation energy and shape factors, et cetera.

What I did was, of all those possible test conditions, I picked 1,900 unique planning values, and for each one of those 1,900 values, I generated 500 data tables. When I generated, I'd create randomly generated log normal failure times. I'd have one column for mode A failures and another column for mode B failures, and then I would take the minimum. That's what competing risks means. There are two different failure modes are competing to kill the part, whichever one has the lower failure time wins, and that's the failure time for that part. Since I had 500 data tables, and for each one of those, I could calculate what F-tot was, because I know FA and I know FB, so I can put that into my formula and calculate the overall probability of failure.

Since I had 500 of them, I have an idea what the distribution of F-tot was for each test condition. I could use the quantiles. I could take the 97.5 quantile of the 500 values and use that as my upper confidence bound. Now I would have 1,900 different upper confidence bounds for each one of those test conditions. Then I could take all that data and feed it into a neural network and make a prediction. I could see what the effect of my test conditions or assumptions for test planning values, what impact they had on the upper confidence bound. One question was, after I did this, which factors were most important?

I used the predictor screening platform for that. I'll show that here. There we go. I have my predictors z_A and z_B, and I see that about 80% of the time, z_A and z_B showed up as being important in predicting the upper confidence bound. Then also for N, the number of parts per temperature, that showed up about, say, 15% of the time. Between those three, those three showed up 95% of the time. Now, there were these other factors, too, and they did not show up as often. The percentages here are much smaller. But I included them in my model anyway, because I found out that varying them, they could actually have an impact on the contours smaller than these, but they did have an impact, so I wanted to be able to take that into account.

Also, censoring time, I ended up not using that at all. That's because when I did the simulations, I assumed that the practitioner would choose temperatures and times where they would get some reasonable amount of failures. They don't want to choose some temperature that's way, way too low or some censoring time that's way, way too short, because then you'd have 100% censored data, and you can't do much with that. Since I chose censoring times, which were modest or moderate, as expected, the censoring time did not really play much of a role, and so I did not use that in my analysis.

I did a neural network fit, and if you are familiar with neural networks, you know they are prone to overfitting. What I did was I used the technique that is typically used in this situation to alleviate that, and that is the training, validation, and test approach. What does that mean?

What I did was I randomly picked 50% of my data, of those 1,900 rows, 50% went into the training subset, 25% went into validation, and then 25% went into test. What I would do is, I would actually JMP do this internally, it'll fit a neural network using this training data, and then it'll go test it and see how well does it predict validation data which was not part of the training data. The model has not seen that data, and then it would keep making adjustments to the parameters here for the training data until it did a better and better job validation. Also, there's an internal a way of making sure you don't overfit or underfit. You want to fit just right so that the algorithm will stop if you start to overfit the data.

Finally, that final model, you take that, and you test it versus the acid test, the test group. The model has never seen that data at all, and you want to make sure it does a good job fitting there, too, because then you can trust the model. I did that. Here's for the training data. I had a nice high R-Squared, 0.99, which is fine, but really the question is, how well does it do on data that it hasn't seen? In this case, I found that for the validation data set, I got an R-Squared I got about 98%, and even for the test, I got an R-Squared of 96%, so that's really doing pretty well.

I would have been concerned here if one of these two values was very, very low, it's much lower than 99%, because that would mean that I was overfitting the data. That's not what happened. I got a very good fit, so I'm happy with this, and I'm going to go ahead and use it to make predictions.

Let's go ahead and see how we would create a test plan with this. Let's assume that we have some corporate standard, and our corporate goal is we want to make sure that the upper confidence bound on the probability of failure is no worse than 1 times 10 to minus 4 after 5 years at 125 C. We want to make sure that we pick the right temperatures, the right number of temperatures, that we use the right sample size, et cetera, to make sure that at the end of the day, we have a pretty good likelihood of hitting that target. We're going to use historical values for the Arrhenius parameters to help us out. How do we do this?

Well, again, maybe you have previous experience, and we have tested parts before, and we know, for example, that mode A dominates at low temperature, and we have an idea what the activation energy is and what the shape factor is. Since we have tested here before, we can back out, and we know what that intercept term is as well, and then we just plug and chug. We know the time of interest is 5 years. We know the temperature of interest is 125 degrees C, and we know our Arrhenius parameters, or at least we're willing to guess what the Arrhenius parameters are, and we plug and chug, and we put it in.

In this case, we see that z_A is minus seven. We're going to do the same thing or similar thing for mode B, not exactly the same. We have an idea, we say what the activation energy is and what the shape factor is. But to make my life easier, again, I introduced a constraint. If you're life testing here, and you have both modes present, I'm going to assume that at your middle temperature, the median time of failure for each mode was the same. That's because if we were testing several temperatures, say, and mode B was not present or only present at the last temperature, I would probably just throw that data out and just use the mode A failure alone.

But here I'm going to assume that the mode A and mode B have the same median time of failure. Also, it makes a certain amount of sense. If we were going along and mode B was dominating, mode B was dominating, We would not expect there to be some sudden shift in the median time of failure, going from mode B to mode A at some temperature. It's probably not going to be a cliff. It's probably going to be some sort of smooth transition.

If I assume I know what's going on here with mode A, and I know the activation energy for mode B, or I assume it, if this is fixed, then I know I can extrapolate this to just get the intercept term where it intercepts the Y axis. So given that constraint, I can calculate what the intercept term is for mode B. I'm going to do the same thing. I'm going to plug and chug and put the values I have in for z_B, and then I get out about minus 6.

Let's see what that looks like. I can generate my contour. Here we go. So let's put our values in here. Z_A was minus 7 and z_B was minus 7. So I got the crosshairs here. And my goal was to make this 10 to minus four. I am shy of that goal. It's telling me it's minus three point something, and so that's why the crosshairs are something between the minus three and minus four contours. We're not quite there. That means we're going to have to make an adjustment. Maybe we were wrong in our values or choices are for z_A. Maybe we can adjust those and pull this down, so it's going to cross the minus four contour. Or maybe we can just increase the sample size. Right now, my sample size is 20. I can increase it to 40. Look, now I'm past this minus 4 contour, minus 4.1. So if I use a sample size of 40, I can meet the goal.

Let me put this back for a second.

There's two other things we can get from this contour plot. One is we can use this as a rough guide to the sensitivity. We want to know what happens if z_A changes or say my activation energy changes, my shape factor changes, et cetera. Right now I'm at minus 7, and the contour is minus 3.1. If I put this at, say, minus 5, then they get about minus 2.1. If z_A changes by 2, decreases or increases by 2, my upper confidence bound changes by an order to meta 2. That might be good to know.

The other thing I can do here is… I'm applying this to two failure modes, but I can also use this graph for a single failure mode. To do that, I'm going to assume, say, for example, let's say z_B doesn't really happen. The probability of failure due to z_B is really, really small. That means that z_B, or the probability of failure due to mode B is very small. I can use a value, say, a z_B of minus 10. I can just put in minus 10 here. Now we can put in a different value, say, go back to minus seven. We see now I can just pay attention to this axis and look to see where the contours cross this axis alone. I have some freedom. Maybe I can decrease my sample size or something like that.

I can go down to 10 parts rather than using 20 parts because my goal is to hit that minus 4, and I'm right around the minus four mark, if only mode A is active. It's just another way you can use these plots. Yes, and I just went over that.

Let me wrap up. We covered a lot of ground today. I analyzed the life test data set for a GaN FETs, and we use the competing risk method to find the probability of failure for the two different modes. I also showed what could happen if we could eliminate the one failure mode. For example, if we eliminate mode A, and we're left only with mode B, the reliability would greatly improve, and it would allow us to, say, increase our spec.

I also went over how to plan a life test, an accelerated life test, when you have two failure modes present. I did that using a contour plot, which was the result of a fit from a neural network, and it looks like the fit was really pretty good. I showed how to use that with contour plots. I also showed that those contour plots could be used for a simple sensitivity analysis. Let's see what the effect of during our planning values has on the upper confidence bound. I showed that we could use that same plot for single modes. They don't have to use it only for two mode failures. That's it. Thank you very much.