Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
新入社員向け研修の中で、評価が最も低かった「SQC入門」に対して、「粘土製造演習」という模擬モノづくりを通じて、座学で学んだ内容を実践することで、多くのことを習得できるように設計された事例を紹介します。 1. モノづくり(QMS的観点、プロセス) 2. モノづくりに必要な組織と役割(人) 3. モノづくりの要素(5ME) 4. モノづくりの指標(KPI) 5. ばらつきと可視化(SQC) 6. トヨタの仕事の仕方(問題解決8Step思考) 7. チームワーク 8. 発表 受講生の声 - 「実務に近いモノづくりの流れを体験でき、本業務で自分が果たすべき役割の理解につながりました。」 - 「全員で役割を決め、それぞれが検討しながら最終的な製品を完成させ、問題点を洗い出すことができ、実際の業務でもSQCを活用できる有意義な講義だと感じました。」 - 「モノづくりの流れを原価などの費用面で考えるのは興味深かったです。」 - 「同期と議論や相談、協力しながら作業でき、非常に理解が深まりました。」 - 「各グループの発表に対するフィードバックを聞くことで客観的な目線で学べました。」 また、この演習を聴講した先生が小学生向けにカスタマイズした例もあり「モノづくりはヒトづくり」の参考になる内容です。
鹿児島県茶市場では,荒茶品質改善を目的に,入札される荒茶の外観と水色がデジタルカメラで撮影され,その画像のテクスチャー解析や色度解析から得られる数値は,単価や画像とともにスマートフォン等で各農家にフィードバックされている。今回,お茶の味や香りに大きく影響する荒茶成分値を,栽培情報と画像解析データから説明・予測する手法をJMPの各種機能を用いて検討した。 県内各産地から茶市場に入荷・画像解析・落札後,近赤外法により成分分析された一番茶1,292サンプルのデータセットを用いた。一番茶荒茶成分値(全窒素,遊離アミノ酸,テアニン,繊維,タンニン,カフェイン,ビタミンC)を用いた主成分分析より,入札される荒茶の特徴は,全窒素や繊維に関する指標の主成分1と,タンニンやカフェイン量に関する指標の主成分2で74.4%説明でき,単価は全窒素と正の,繊維と負の相関があった。 管理図より,全窒素と繊維は操業後半(中生~晩生品種)で管理限界を逸脱する事例が多く,全窒素の仕様下限値を含有量5%,繊維の仕様上限値を含有量22%とした場合の不適合率は,それぞれ4.9%,6.4%であった。 荒茶画像解析データから全窒素と繊維を予測するため,栽培情報と10の画像解析項目をパラメータとしたPLS回帰や応答曲面モデルのあてはめを行った結果,両成分値は,入札日と品種に加えて,画像解析項目の「白茎」(摘採する新芽の熟度の指標)を組み込んだモデルで説明できた。さらにプロファイルのデザインスペース機能により,「中~晩生の主要品種は,5月7日までに白茎を程度3以下になるように摘採・製造」すると全窒素と繊維の仕様内割合を98.9%にできると予測された。 以上,荒茶成分値は栽培情報および画像解析データから予測でき,仕様内に維持する栽培指標が得られた。これらの情報を現場での摘採・製造における指導に活用している事例を紹介する。
Thursday, March 7, 2024
Ballroom Ped 4
Companies in the pharmaceutical industry must demonstrate shell life by measuring product performance over time at storage temperatures. To accelerate the test, it is often also done at elevated temperatures. Although Arrhenius demonstrated in 1889 how to combine results at different temperatures in to one model, many companies still analyze each temperature separately. It is not cost-efficient to stratify data into different models. In addition, ongoing verification of shelf life must be performed, where it is often enforced that all individual observations are inside specifications. Due to measurement noise, this criterion is often not met. Instead, it should be enforced that measurements are inside prediction intervals from the initial shelf life study, which is a weaker requirement. JMP has the Arrhenius Equation in the Degradation/Non-linear Path/Constant Rate platform. However, this platform lacks some of the excellent features of the Fit Model platform, such as studentized residuals plot, Box-Cox transformation, random factors, and prediction intervals. This presentation demonstrates how the Arrhenius Equation can be entered into the Fit Least Squares Platform by making a Taylor expansion with only four terms, as well as how a JMP workflow can ease the calculations.   Thank you for giving me this opportunity to talk about how we, as JMP partners, help our clients reducing costs and especially avoiding non-conformities during smart shelf-life calculations and verification of shelf-life. First, I will describe some of the issues we have seen at our clients with shelf-life studies, and especially the enforcement in ongoing verification. Looking at the shelf-life studies first, where you set the shelf-life, we can see that it's based on a number of batches, typically 3-4, where you measure the change over time. Of course, the end level depends on the starting point and the slope you see. Hopefully, the slope is the same for all batches. But we often see that batches start at a different level. This can be a challenge because future batches, when you do ongoing verification, they might start at a different level. If they start lower, you might have an issue that you are out of spec when you read shelf-life. What we recommend to do there is actually convert the shelf-life to a requirement at batch release, meaning what should the value be at time 0 to ensure you will be inside specification at shelf-life. I'll come back to that, of course. We also very often see that linear regression is done on an absolute scale, but degradation is relative. If degradation is large, you're actually not describing the degradation rate. The solution is simple. Just take Ln to your data before you make the regression. I'm sorry, Pierre, to interrupt. I can see a button saying, Zoom is sharing your screen. I think it's gone now. Is there a way you can hide your taskbar? Because it's quite large. It's like a double taskbar at the bottom. Okay, let's try something else. I think maybe if I do like this, is this better? Much better, yes. Because now I'm showing in presentation mode, so maybe I should show it in presentation mode instead. Yes, that looks good. I will just try before we go on just to see, because then when I go to share a JMP, I will do it like this. Then I'm afraid you will see the taskbar when I'm working on JMP. That's fine, yeah, I guess. Should we do it that way? Let's do it like that. When I do a PowerPoint, I will do it in presentation mode. I think that works better. Absolutely. Perfect. Thank you. I guess we just start all over. Start over again. Sorry about that. It's about two minutes, so not a big disaster. Thank you for giving me this opportunity to talk about how we, as JMP partners at NNE, help our clients making better shelf-life studies and especially minimize the number of non-conformities during verification. First, I will describe a little bit the issues we see with shelf-life studies and especially the ongoing verification. We are going to enforce it's still valid, the shelf-life you stated. This is done at regular intervals testing some batches. If you look at the first study that is done, that is the shelf-life estimation, where you set your shelf-life, and they would typically take a set of batches, 3-4, and have them to decay over time. You measure these four batches over time and see how much they decay, and then you can calculate how long a shelf-life you had. But the level you have over time, of course, depends on the slope, which is the main purpose, the decay. In a shelf-life study, but it also, of course, depends on where you start. Since you only have three and four batches in your shelf-life estimation study, and you're going to verify it on other batches going forward, if they start lower than the batches you had in your shelf-life study, you might have a problem in the verification, even though it doesn't decay more just because it starts lower. Our solution to that is actually to convert the shelf-life equation to a requirement time saver start value, simply a release limit, which, of course, then should be better than what you require at shelf-life, so there's a room for a change. Then you're not sensitive to future batches because you will ensure they start high enough, so to say. We also often see that people are doing the regression over time on an absolute scale, but degradation is relative. We strongly recommend to take Ln to your data before you make the regression. We also see many companies having big issues with measurement reproducibility, and shelf-life is the most difficult measurement situation you have because for obvious reason, you have to measure the batches at very different time points, typically years between. Of course, everything changes, and if you don't have a very stable measurement system, you get a lot of noise on your regression curve. You can actually reduce that by entering your time point as a random factor in the model. We also very often see companies doing shelf-life at many different temperatures, which is a good idea because then you can accelerate the test. But for some reason, typically, temperatures are modeled on their own. It's very rare that we actually see people modeling it across temperatures. We get one model describing all temperatures. Of course, we strongly recommend that we model across temperatures because then you have more degrees to feed them to estimate your residuals. In the same area, we also see that when people are going to model temperatures, each temperature on its own, of course, you need time 0 measurements at all the different temperatures. But then it's actually the same measurement you're using because at time 0, it's just the start value. Then you have to be careful when you go into being the model across temperatures because then you shouldn't have the same observation there four times, for example, if you have false temperatures. It's very important modeling across temperatures that you only have a single registration at one temperature. Then it doesn't really matter which one because it's at time 0. This is the issues we see when people are setting shelf-life. This is a little bit the solutions we recommend to it. Then when you're doing the shelf-life verification, meaning now you have stated that you have a certain shelf-life, then you have regular intervals to prove this shelf-life is still valid, and there is an ICH guidance for that. That's also included in JMP. You'll see that in a minute. But what you do there, you make these typical three to four batches, and then you take the confidence limit on the slope for the worst batch and having that to state what is your shelf-life. But this is a little bit challenging because then you are assuming that you have seen the first batch among the first three, which is typically not the case. This is actually the reason why we do not recommend to use the ICH method because it often leads to problem in verification. We also often see that people get a too pass optimistic estimation of the slope standard error because they are assuming that it's independent observations, and then number of degrees of freedom is just n-1. However, very often many measurements are done in the same analytical one. If you have analytical one to one differences, it is not independent measurements, and you need to correct for that by making the effective degrees of freedom. Just by putting in the measurement date as a random factor, you won't get that. Then you will typically get a little bit bigger standard error, which might be seen as a problem. But actually what happens is that it increases the requirement for start value, and thereby you are minimizing the risk that you fail in verification. Last but definitely not least, we still see many companies saying that at verification, all measurements should be inside specification. But if you do have a measurement issue, and always you do have a measurement issue, then you can actually be outside specification at shelf-life due to measurement issues. What we recommend there is actually build a proper model with the right degrees of freedom. For this model, actually on the same data as you used to set the shelf-life, and based on this, make a prediction interval where you can expect future observations to be. This interval will typically be slightly wider than your specification interval and thereby minimizing the risk of failing. I will now shortly describe the formulas and the platforms and where to find them in JMP, but I will do this quite rapidly because I'm sure you can get access to the formulas in the presentation material afterwards, and I think it's more interesting to get into demonstrating it in JMP. But let's first start with this release limit thing. How do you convert your slope and your standard on slope into what should the release limit be at start? Let's say that we have something that drops over time, and we have a Lower Specification Limit. Then you simply, and we're taking that from the VSO guidance for stability, evaluation, and max science. They simply just take the Lower Specification Limit, then they subtract the estimated slope, time to shelf-life, so how much does it drop on the shelf-life. Then you also need to add some uncertainty. It's coming from this standard error on the slope, and there's also a measurement standard deviation on your starting value. We're actually just using this formula besides that we are converting the normal quantile to a t-quantile because it's an unknown standard deviation. As an estimate of the measurement repeatability, we're actually using the IMC from the model. But besides that, it's exactly as described in the VSO guidance. Of course, when you build your model, you'll get the slope from your modeling JMP, you will get your effective degrees from the modeling JMP based on the equation down here, you will get the standard error on the slope, and you will get your residual error. Based on that, you can just feed this into this formula, and you do have your Lower Release Limit. If it turns out that one of the bottlenecks is actually the measurement noise at the start value, you can just make several measurements at start and take the average of these, and thereby you can suppress this by the square root of N. But of course, these measurements has to be taken in difficult analytical-wise. Next, when we're going to model the weight versus time, you can just go to Fit model. Of course, there you can make your regression. There we strongly recommend that you take a lock to your result, because then with a constant rate reaction, that will be linear proportional to the time. Of course, if you have small decays, then you can also just make it without the lock. But why bother about if the case is small or large? Just take a lock and it works no matter the size of the decay. If you want to model across temperatures, you have to use the Arrhenius equation, where you can actually describe decays at different temperatures using an activation energy. If you go to the degradation platform with a nonlinear path, this can nicely be described in JMP, and there you can actually build a model across all temperatures, that's actually pretty easy to do if you find this platform. It's the right, I assume, but you better find it, and I will demonstrate it in a minute in JMP. Based on that, you will get your model coefficient. You have an intercept, you get a slope, you get an activation energy, and they even come with standard errors and some covariance. If you put these numbers together, you can actually calculate the slope at each temperature, and you can calculate the standard error on slope at each temperature. Then you can feed this into the Lower Release Limit, and you will know what that should be. All these model parameters, and the standard errors, and the correlation, you simply get from JMP. But you have to put it into this equation I've shown up here. It's actually not as straightforward as it might look. For that reason, we actually recommend to make a Taylor expansion of the Arrhenius equation, because if you make a Taylor expansion, you can fit it a bit of polynomial. Normally, we see that up to third order is sufficient, and that requires four different temperatures. The great thing about going into Fit model where you can do it, when you have done the Taylor expansion is there, you can put in random factors. As I mentioned previously, you actually need to put in measurement time as a random factor. But often you also would like to have the batch as a random factor because we would like to predict what's going on in future batches, not just those you use for the study. Of course, you also get better model diagnostic too. If you scale your Arrhenius temperature properly, so it 0 at the time of interest. All these terms here goes out because they are 0. Then it's very easy to get the slope and the standard error because that's just the coefficient and standard error in front of the time parameter in your model. It's much easier than what I showed on the previous slide. Then you can also put in on top of the third order data expansion, you can also put in batch multiplied by time, interaction between batch and time, to see if there should be a batch dependent slope. Hopefully, that's not the case, but it can be the case, or even worse, we also put in a Arrhenius temperature time, times batch to see if the activation energy should be batch dependent. That's rarely happened, and of course, shouldn't happen. But I think it's nice to check before we make the assumptions. Now, let's get into JMP to see how does this work in JMP. I will now shift to JMP. Here I have a case where I have studied some batches at different temperatures and at different times. Let's first look at the result. Here you see the result, of course, with an Ln on. It's supposed to be linear at three different temperatures, 15, 25, 30, and 40, from 0-36 months. As you can see, as expected, the higher the temperature, the higher the slope. This can easily be described by the Arrhenius equation. Each batch is measured in duplicate at each time point. You can also see the typical case that we have some measurement variation from day to day. For example, you can see all the measurements made at three months, they are above the regression equation, indicating at that day we measured too high, and for example, at month nine, they're typically too low. Clearly, this curve is contaminated with... Or these numbers are contaminated with your measurement noise. This is quite typical because it's measured at very different time points, and they can easily be that you measure higher on some days than others. You will see what influence that makes in a minute. Let's start doing some modeling. You can actually go into the degradation analysis in JMP. There, you cannot combine the temperatures, but you can do it by temperature. This is actually following this ICH guidance. I just opened it here for 15 degrees. It works the way that you, with a significance level of 0.25, look for, can you assume common slope and common intercept. If the P-value is below 25, you have to use separate intercept and separate slope. You can, for example, see here for the 15 degrees, following the ICH guidance with this significance criteria, you can assume common slope, but you will have different intercepts. Then you actually with these different intercept common slope, you are making a common interval on each regression line, and then you take the word batch, which in this case is batch B. Then you are saying where this catches the spec limit, the lower spec limit, Ln scale, this is your shelf-life, in this case, 55 months. It's very easy to do, but there are some problems with this method. The first thing is that this significance level of 0.25, of course, give a high risk of getting false significance on that you need to have different slopes, but it might not be needed. Even worse, you're just looking at the worst of the first three batches. I mean, it's not very probable that the worst batch you're ever going to make is among the first three. This can really give you some serious issues later on the ongoing verification. Even though it's easy, it's not what we recommend to do. Of course, you can make exactly the same models in Fit model, that's what I'm showing here. Again, first by temperature, and later we will combine it. There you can see you can put in time, batch and time times batch, in this case for 15 degrees. Time times batch will be the term that can take into consideration that slope might be batch dependent, so cannot want common slope. But you can see it has a high P-value, but I don't like to use P-values because they are so sensitive to sample size and signal to noise ratio and so on. I prefer to use information criteria, which are more robust towards across different sample sizes and noise levels and so on. I prefer the Akaike information criteria, which is minus log likelihood, so the lower the better. If I take time times batch out, it's 188-... It actually drops to -193, meaning it's a better model. Now I have justified that I can have common slope. Then I can go to batch number. It has a more borderline P-value, still in the high end, but I'm not using the P-value, I'm using the information criteria. If I take it out, it drops from 100 -3 to -195. Still dropping. In this way, I have also justified, I can use common intercept. However, as I mentioned before, be careful here because these numbers are not independent. I'm not telling JMP that for now. You can do that better by going to Fit model and do the same model. The only difference, now I'm putting in measurement day as a random factor. Now I can correct for that these measurements are grouped, that numbers on the same day comes from the same analytical one. Now you can see that you get different P-values. Again, looking at the information criteria, you can see batch number times time -207, It drops to -239. Better model, justified common slope. But see now what happens when I take out my batch number. It's the same as before, just adding measurement day. It's -239. Now it actually increases to -236. I shouldn't take that one out, meaning I cannot assume common intercept. It's a good example why you need to put in measurement day, because otherwise you could be fooled by numbers are not independent. If you want to model across temperatures, as I mentioned previously, it's fairly easy to do. Just go to the degradation data analysis and put in this nonlinear part. Actually, you have the Arrhenius equation built in, and you will get across the four temperatures a common intercept, a common slope, and a common activation energy. However, I've just shown that, yeah, the common slope is fine, but the common intercept is questionable. We need actually separate intercept common slope, and you cannot really do this here. What you can do is that you can go in and say, I want separate parameters for all batches. But then you get separate intercepts, you get separate slopes, and in this case, you get common activation energy, which I think makes sense. But you cannot really here have the combination of common slope and separate intercepts. But there's a solution in JMP. If you go to the nonlinear platform, you can build your own fitting equation. There you can actually, with the Arrhenius, have, as you can see here, a common activation energy, a common slope, but separate intercept. So it can be done. But the challenge here is that you cannot put in random factors. You're having a hard time correcting for, it is not independent measurements. You cannot put batch either in as a random factor. You have a hard time making a model describing batches in general, which is typically what you need. For that, we actually like to go to Fit model and put in this Taylor expansion of the Arrhenius equation, as you're seeing here. To first, second, and third order, of course, putting in the batch number for different intercept, put in batch number times time to be able to handle that you might have a different slope, hopefully not. Even worse, you can also put in Arrhenius time times batch number to correct for you might even have batch dependent activation energy, which could be strange. But looking at the AICc, you will not take this time out first, and you see it -730. How lucky can you be? It drops to -67 as it falls. It's a better model. I have justified now that I can have a common activation energy. The same batch number times time, it has a borderline P-value. But again, looking at the Akaike, you can actually see it's still dropping. I can actually justify now that I also can have a common slope, which is, of course, a great thing. This you can also do in this model. As you can see down here, I have actually put in the Measurement Day as a random effect because this you can easily do in Fit model. You cannot do that in the degradation platform. Hopefully, you have seen here that there are many different ways of calculating the slopes. I've tried here to see what difference does it make for your Lower Release Limit. If you're running this one here, this small script, you can actually see here there are many different... I could do it by temperature, by temperature with random time. This degradation platform would come in everything, individual everything, the nonlinear and so on, the Taylor without random time and Taylor with random time. Over here, I just type in the slopes and standard or slopes we get from these models. Here you can actually see what is the Lower Release Limit if you only make one measurement batch release. You can actually see the method we recommend, which is Taylor expansion with random time, gives one of the highest release limits. Of course, when you have a higher release limit, there's a lower risk that you will later on have issues in ongoing verification. There we would require that all batches should start about 10.09. Otherwise, we cannot be sure they still work at shelf-life. You can see if we take the random time and do it on 15 degrees alone, this is where we have the requirement is a 15 degrees not combining, you get an even higher release limit. But that's because you get fewer degrees of freedom by doing a separate model. We can actually get a Lower Release Limit, which you can still rely on by building a model across temperatures. As you can see to the right, if you take two measurements at start, batch release to suppress measurement noise, you can reduce it further. If you take 10, you can go even further down. How many measurements you would like to do in start? Batch release, of course, depend on your measurement noise and it's distributed in the bottleneck. But it really makes a difference which method you're using. If you're not correcting for it's not independent measurements, then you can easily get a too Low Release Limit, which will then give you issues later on in ongoing verification. If you want to describe batches in general, for example, to setting the shelf-life, then you also need to add batches as a random factor. Now I'm putting both batch and Measurement Day as random factors. Now I'm not only describing the three batches I used to make my study, I'm describing batches in general. Then you can just go down to the Profiler. I put it at Arrhenius temperature 40.272 that corresponds to 15 degrees Celsius. You can see here's the state of shelf-life. They would like to prove that they have 24 months. If you look at the general line for all batches with confidence, it nicely stays inside its limit. This company has no problem at all proving that they have a shelf-life of at least 24 months at 15 degrees. You can see I'm running 5% one-sided Alpha because it's only a problem to be to one side. However, if I want to predict where could individual measurements be, because this is only showing where the two line is, then I can actually do the same model again I can exact the same model. I just changed my Alpha to 0.135% one-sided, like what's inside +-3 Sigma to describe everything. Then down here, you still see the same. Of course, it gets a little bit wider by taking my Alpha. Then I would like to show the predictive limits down here. Unfortunately, you cannot show predictive limits on a profiler in version 17. I'll just shortly go to the 18 early adapter. It's not released yet, but it will come. Running the same model in there, because the great thing in version 18 is that you can see both confidence limits on a profiler, that's the dark way. But you can also see prediction limits or individual confidence limits. This is where you can expect individual observations to be with this shelf-life you have of these batches. You can see you can expect to have values slightly below lower specification. For ongoing verification, we recommend actually to set the requirement that there should be inside the prediction interval, which is slightly wider than the specification limit. This way you can also avoid non-conformities due to measurement issues. Hopefully you have seen that there are a lot of pitfalls in doing shelf-life and verification, but JMP has a good toolbox to work around these pitfalls and actually to do it right. To conclude, I will just go back to my presentation and go to issues I started with, just show them again, because now we have been through the matters. When we do shelf-life at clients, we strongly recommend to convert it to a release limit because then you are sure that all batches you're going to make in the future will live up to the shelf-life, because you are putting on a requirement where they start. Of course, do all models on Ln data. It's just put lock on in the model dialog window, and you get the regression that is supposed to be linear. End the time as a random factor in the model. Then you can correct for that you're probably not having the same measurement level at all days. Of course, build a model across temperatures with this Taylor expansion of the Arrhenius equation that's easy to do. Then remember, of course, not to have multiple registrations. Time 0 point should only be entered at one temperature. Then, when you're going to the ongoing shelf-life verification, we do not recommend the ICH method because it actually assumes that you see the worst among the first three, which is probably not the case. It's also very important when you calculate the release limit that you get the right standard of your slope, you get the right degrees of freedom. When you do not have an independent measurement, it's not just n-1. You really need to put in typically the analytical one as a random factor. Then last, but definitely not least, please put a specification on your ongoing verification. They should conform with the prediction interval you made on the batches used to set the shelf-life. Thank you for your attention. Hopefully, you got inspired how you can do shelf-life in a very good way using JMP. Thank you very much.
Labels (2)
Thursday, March 7, 2024
Ballroom Ped 3
SiO 2 thin film has been widely used as STI liner, gate oxide, spacer, etc., in the semiconductor industry. The thickness of SiO 2 layers is strictly controlled and is affected by facilities, chambers, and measurements . Among these factors, thickness is directly susceptible to measurements. If measurement queue time is too long, true thickness of  the SiO 2  layer formed from thermal process may be distorted as thickness may increase naturally in the atmosphere. To analyse effects from queue time and measurements on SiO 2 thickness, JMP GRR analysis was introduced. After defining the operation, a cause-and-effect diagram is used summarize possible factors for thickness shifts. Next, thickness from coupons is collected, based on JMP MSA design platform. The thickness of each coupon is measured multiple times as repeatability tests and degradation tests, with the same repeatability tests conducted every three hours as reproducibility tests. Once the variability in thickness from repeatability and reproducibility is analysed using Xbar and S charts, GRR analysis is performed to evaluate current GRR performance. Finally, relationships between P/T ratios, alpha/beta risks, and spec tolerance, regression models between thickness and queue time are built to determine if the measured thickness is to be trusted.   Hi, everyone. I am Jiaping Shen. I am a Process Support Engineer from Applied Materials. Applied Materials is a leader in materials engineering solutions, is to produce virtually every new chip and advanced display in the world. There is an internal JMP program inside Applied Materials to help engineers solve engineering issues based on JMP. Today, as a member of the JMP program, I'd like to share how I use JMP to do gauge repeatability and reproducibility analysis on silicon dioxide thickness to assist queue time control and measurement capability evaluation. In wafer fabrication, process engineers rely on metrology tools to monitor each layer to ensure product quality. If measurement results are not accurate, it may lead to quality issue. So how are measurement results affected? If one tool measures the thickness of part several times and variations are huge, the tool repeatability is bad. If another tool measures the same part again, the gap between these two tools is huge. It means reproducibility is bad. The analysis about repeatability and reproducibility of gauges is called GRR analysis. In this project, I take silicon dioxide thickness as an example to introduce how to evaluate measurement capability. Different from other GRR project, I use measurement queue time levels to introduce reproducibility. Here is a general overview of the analysis of the flow. Based on the data collected, we evaluate the GRR performance and conduct root cause analysis to see if there is any further improvement. Then we discuss current processability and explore future opportunities. The silicon dioxide thickness was collected from 15 coupons on a wafer. Each coupon got measured four times after zero, three hours and six hours after silicon dioxide generation. Finally, we got 180 data points according to JMP, MSA Design platform. The thickness spec is from 97-103 angstrom. For GRR performance, we have four success criteria: P/T Ratio, P/TV Ratio, P/V Ratio, and P/M Ratio. Among the four criteria, the numerator is precision, which is calculated from variations due to repeatability and reproducibility. In this project, tolerance is six, and I will use P/T ratio as a success criteria. How about GRR performance? The first model shows P/T ratio is eight, less than 10%. It means the measurement capability is good, while the P/TV ratio is 31%, greater than 30%. It means measurement capability is bad. Why? This is because the part range is too tight, so we cannot trust P/TV ratio, and we need to trust P/T ratio, and it shows measurement capability is good. How about interaction between part and queue time? From crossed GRR model, the interaction only accounts for 0.2% of the tolerance is negligible. With current capability, how possible will we make any mistakes in judging whether a part is within spec or not? The risk that a good part is falsely rejected is called alpha risk. Higher alpha risk increases production cost. The risk that a bad part is falsely accepted is called beta risk. Higher beta risk brings risk to customers. During production, parts at the target have zero alpha and beta risk. Good parts near spec limit have high beta risk and bad part near spec limit have high beta risk. How about alpha and beta risk in the project? Both are zero. Can we trust it? No, this is because all the parts are within spec limit. It is totally different from the actual production. As a result, we cannot rely on the calculated risk. Next time, we should deliberately pick up parts that 90% of which are uniformly distributed in spec range to simulate the true production. The current measurement capability is good, but do we have improvement opportunity in the future? I will use Xbar-S chart to analyze root causes of GRR from repeatability and reproducibility. From the top repeatability chart, X-axis includes 15 parts at three queue time levels. Y-axis is standardization representing the repeatability of each full repeat. Overall, standardization is very stable. That's queue time effect repeatability. The purple line is average standardization for each queue time level, and you can see there is no trend. How about standardization for each part? You can see standardization is lower at wafer center while higher at edge. It may be attributed to higher stress at wafer edge. Repeatability is very stable. How about reproducibility? Most of the parts are beyond measurement arrow red wine, so metrology tool can differentiate between parts. The trending purple line indicates that the average thickness increased by 0.2 angstroms up to 6 hours, far below the spec tolerance six. So long-term degradation risk is low. The M-shape curve is what we want to get best [inaudible 00:07:27] uniformity. If we overlapped three M-curve together, they are parallel, so there is little part to queue time interaction. The repeatability is stable still, and reproducibility is also good compared to our spec tolerance. Still, pair of T-test between the first and fourth repeats are conducted to evaluate short-term degradation risk due to native oxidation. The difference is statistically significant while not practically when comparing to the spec tolerance. There is little concern on any part measurement degradation within four repeats and ANOVA cross-GRR model is safe. In the previous slide, we are talking about measurement capability. How about process capability? Process capability, Cp, is calculated by ICC and P/T ratio. ICC in this case is 0.9. P/T ratio is 8.88%. Final Cp is greater than 2 and falls into the green region. It means process is capable, measurement is capable, and stable within 6 hours. However, because the ICC is highly depending on sample selection, ICC is less reliable compared to P/T ratio, so we better keep ICC fixed and move P/T horizontally as our first move if we want to do some adjustment. Keeping ICC fixed and moving P/T ratio from 0.08 to 0.16, we reach the green boundary. In this case, spec limit is tightened from 6 to 3.35. How about other risk? There are three graphs show the P/T ratio, alpha risk and beta risk, with tolerance reduced from 100% to 30%. As tolerance is being reduced, P/T ratio increased to 29.6%, marginally acceptable. Alpha risk is still under 5%. Beta risk goes beyond 10% when tolerance is reduced by 40%. Based on three criteria, we can tighten tolerance range from 3 to 3.6 and keep P/T ratio around 50%. This graph summarizes how we iteratively and continuously improve process and measurement capability in different scenarios. When Cp is greater than 2, P/T is less than 0.3, marked by the light green stars. We should consider tightening spec until Cp is equal to 1.33 to be ready for improvement. When Cp is less than 1.33, P/T is less than 0.3, marked by the blue star. We should improve process part-to-part capability and reduce the ICC until Cp is equal to 2. When Cp is less than 1.3, with P/T greater than 0.3, marked by the orange star, we should consider optimizing GRR performance to reduce P/T ratio to less than 30% and improving Cp at the same time. That is how we could make decision to improve measurement or process in different cases. This is how we conduct GRR analysis based on different queue time levels. Thank you for listening.
Jiaping Shen.JPG
Labels (2)
Wednesday, October 23, 2024
Executive Briefing Center 9
Reliability assessment of devices and interconnects in semiconductor technologies is typically done for technology certification and periodic monitoring, using relatively small (single digit to tens) sample sizes per condition. Volume manufacturing data can be used over time to assess dielectric reliability by ramped voltage-breakdown measurements on scribe-lane test structures. Over time, this can provide a detailed view of dielectric behavior, including a mixture of intrinsic and extrinsic mechanisms affecting dielectric integrity. In particular, low failure-rate outliers or tails can be detected and addressed, which may otherwise pose field-quality risks. For practical reasons, the ramp may be stopped at a target voltage to reduce test time and avoid damage to probe cards and needles, which may result in a small number of data points being censored. Fitting large data sets with a small number of censored data points can lead to convergence challenges, resulting in incorrect fitting parameters and lack of confidence intervals, as well as posing significant computational challenges. This work explores these challenges with the JMP Life Distribution platform and examines alternatives and solutions to allow correct analysis, fitting, and extrapolation.     Hello, JMP Community. My name is Mehul Shroff. I'm with NXP Semiconductors in Austin, Texas. Along with Don McCormack from JMP, we are going to present a study on the Assessment of Dielectric Reliability in Semiconductor Manufacturing. To introduce the topic, this is a cartoon of a basic semiconductor transistor known as a MOSFET. Within the structure, we have a gate dielectric that is used to insulate the gate from the channel, and this is what acts as the switching element in the transistor. The gate controls the flow of charge carriers in the channel. The integrity of the gate dielectric is very crucial for device reliability in the field. One way we can look at this is through high volume manufacturing ramped-voltage breakdown studies, measurements that can help us detect drifts and defectivity issues in the gate dielectric. This is typically done on scribe-line test structures, where we do have smaller areas, but over time we can get larger sample sizes. In the semiconductor world, we think of field quality in terms of the well-known Bathtub Curve, where the observed failure rate is seen to initially decrease with time then increase later in time. This is due to the contributions of a few different groups of mechanisms. The first, known as Extrinsic Reliability mechanisms, mainly deal with early failures such as latent defects and infant mortality. We can reduce this quite a bit through various screens and tests that we do before shipping the parts out. The next one comprises of Constant or Random Failures such as soft errors, latch-up, ESD. Then the last one comprises of intrinsic reliability mechanisms which focus on the wear-out of the device, including dielectrics, that occur over time and increase with time. When we collect dielectric breakdown data, we typically would like to represent it by a Weibull because that's what the ideal distribution would fit. But in practice, our distributions are often mixtures, as is shown this data set, where we can have an extrinsic tail due to defects in local thinning, then most of our data, hopefully, is intrinsic or natural breakdown. But that can be convoluted by process variations such as those shown here in this wafer map where the gate dielectric thickness varies over the wafer, causing differences in the intrinsic breakdown and the curvature that we see on the high side. Then some sites don't fail within our test time or max voltage. To avoid probe needle damage and keep the test time manageable, we stop the testing, and therefore, that then becomes our sense of data. Now, why do we need to fit this? Mainly because we want to be able to monitor drifts over time that result in changes of the parameters. We want to understand the impact of process changes or improvements, and we want to be to project out to low PPM behavior for high reliability applications such as automotive. But what we see is that when we try to fit these mixed distributions, we often run into problems where we can't converge and therefore don't get confidence in tools as shown here, where we get the nominal fit, but not the confidence in tools. This can be due to a combination of large data sets, imperfect or mixed distributions, and censoring. Here's an example of the same data where we have a distribution with some censoring, and we can't converge. Here, if we treat the censored data as failing data, which is clearly incorrect, but just for the purposes of illustration, we are able to get it to converge. But we see that the fit itself is slightly worse than what we had before and doesn't quite fit the intrinsic distribution where we'd expect it to be. But this is just an example to show that the presence of censored data points a role in our ability to fit this data. Here's an example where we had a process improvement, where we had a breakdown between the gate and the source train of the transistor, resulting in a pretty severe extrinsic tail. Through a series of process improvements, we substantially cleaned it up. We see here that in one data set, we had some censored points, and we were unable to converge. The other data set, we didn't have any censored points, and we were able to converge fine. Even though we can tell that, visually, we have an improvement, we are unable to judge statistically what the improvement is. We can see the same thing here in our distributions where the mean distribution changes by a little bit, not a whole lot, but we can see that the minima at each site has significantly improved if you look at the scale and the color gradient. Here we took the same data as before, and we see we have the same convergence problem. Here we have a different data set where we also had two versions of the process, one slightly better than the other, though both showed significant tailing. However, there was a difference in the intrinsic fits. In this case, both had censored data, and yet they both converged. To test whether this was related to the size of the data set, we took this data set, inflated every I think by 10x to maintain the same distribution, and compared this in this analysis. Even though the number of censored points and the total number of data points increased, we were still able to converge. Based on this, we were able to rule out the size of the data set as a factor. We thought that this was driven by the Weibull scale and shape factors. At this point, I'd like to hand off to my co-author, Don, to take it from here on. Please go ahead, Don. Thank you. Thanks, Mehul. Let me get into slide mode and we will start talking. Even though we're talking in the context of semiconductor manufacturing, this really applies to any case where we're collecting large amounts of data and there can be various factors that are impacting the failure mechanisms. Like I said, just because we're talking about semiconductor data doesn't mean it only applies to semiconductor data. Let's talk a little bit about what happens with the Life Distribution platform, because that's what we're using primarily. Just to look under the hood, get a little bit of an understanding of what's going on and what might be causing these poor fits and these non-convergence issues. Let's start off by saying that what we're trying to do with Life Distribution is we're trying to optimize a function, specifically, it's the likelihood function that we're looking at. The important thing about the likelihood is that what it is going to give me is the most likely or the highest probability parameter estimates given the data. That's the important aspect because this approach is data-driven. The other wonderful thing about likelihood functions and the Weibull distribution is that because there's only two parameters in the Weibull distribution, the alpha and the beta, this is really easy problem to look at visually. If you take a look at the graph on the left, that's just the likelihood function plotted in three dimensions where I've got beta and alpha on my X and Y. What we're trying to do is we're trying to find the minimum point, the minimum of that graph. Looking at the graph, you could tell that regardless of where you start on the graph, wherever the starting values in the parameter space, you're pretty quickly going to converge at that minimum point. When the data is behaved well, things go quickly, things go really, really smoothly. Now, question is, what went wrong or what happened to cause the non-convergence issue that we saw in the data that Mehul was talking about. In most cases, a lot of this can be traced back to observations in the data set that are just not representative of the principle underlying model. Now, obviously, the first thing that a lot of people think about are outliers, so I can have outliers in the data that are causing problems. The data can come from more than one distribution. I could have multiple failure mechanisms, and they can be part of the data set. To be fair, to be honest, and to be complete, obviously, the underlying model could be wrong. For example, maybe the Weibull wasn't the best distribution. We're going to assume that because there's a fairly decent device physics understanding that the Weibull distribution really is the best distribution to use in this case, we're going to discount that very last bullet. What are some possible... Knowing that we have some of our data that's not representative, what can we do in the context of some of the JMP platforms to help us fit a better model or maybe diagnose our problems? What is to change the algorithm or to change the parameter values? Now, this is not directly available in Life Distribution, but there's another wonderful platform within JMP, Nonlinear, that allows me to do this. A little bit more flexibility than Life Distribution. Sometimes when I run into these problems, I can use Nonlinear. I can obviously find and remove those values that non-representative of the main underlying model. Lots of tools in JMP to allow us to do that. I'm going to show you a tool you might not be familiar with, and that is Fit Parametric Survival. Using the data filter to be able to find problem observations in the data. Then finally, we could find and build a more representative model. This obviously could be done using the Nonlinear platform. However, we're going to use a much easier part of the Life Distribution platform, the Fit Mixture, to be able to do that. Again, just a real easy, simple recap of the tools. We're going to be looking at Nonlinear, the Nonlinear platform. We're going to be looking at Fit Mixtures under the Life Distribution platform, and then we're going to look also at Fit Parametric Survival. Let me go to the examples. I'm going to start with a real simple example. I've simulated data from a Weibull distribution, so nothing fancy here, very straightforward. However, what I'm going to do is I am going to add an outlier to the data. Here's my simulated data with the outlier. So 700,000 observations were the bulk of the data. One observation was the outlier, so this guy right out here. Now, you might say it's inconsequential, easy to visualize that, pull it out of the data. However, this does cause a problem with the Life Distribution platform. If I were to run that with the Life Distribution platform, you'll notice that I have non-convergence issues. It's all due to this one outlier right here at the right-hand tail. What can I do in those cases? As it turns out, Life Distribution gets me most of the way there. I can use the Nonlinear platform to get me the rest the way there. Let's open up Nonlinear. The one difficulty with Nonlinear is that you have to know the form of the Log Likelihood function. There's a lot of that available in the literature, Certainly, if you know anyone with a statistical background that can help you come up with these distributions. I've got my Log Likelihood in one of my columns, my data set right here. I'm going to use the Nonlinear platform. What I'm going to do is I'm actually going to start with the parameter estimates that I got from Life Distribution. I'm going to start with a beta of about 35, and I'm going to start with my alpha of about 5.6, and I'm just going to click Go. As it turns out, with that amount of control, I am able to converge and come up with parameter estimates. If I were to actually calculate the log likelihood, I see that really Life Distribution came close. It just didn't get that last step along the way. I use Nonlinear platform to allow me to do that. Now, that said, there still can be issues. Here I have another set of data. Let me show you the graph of it first. This looks, again, almost identical to that first example. The only difference here is that I push that outlier further to the right. Obviously, if I were to try to I get this in Life Distribution, I get the same problems I saw with that first set of data. If I were to pull this up into Nonlinear and run my convergence, I get a warning message saying it converged, but I've got one loss formula result that had missing value. If I were to go and save these estimates, and what I will see if I go under that column with my second example, the two observations, two of those observations, two of my intervals, you can't calculate the log likelihood. This is the problem that's occurring with the Life Distribution platform is that when I have these observations that are sitting way out on the right tail, it's breaking the log likelihood function. I don't have time to go into a whole lot of detail as to when exactly this happens. In our slide deck, we have one slide in there that shows you how to calculate where those values might be. But this is what's happening when you've got those outliers that sit far off in the tail. Let's move on to the second example. Here we've got something slightly different. Rather than having just one observation that's sitting out on the tail, let's say we've got really two sets of data. We've got two failure mechanisms. I've got my main failure mechanism that's simulated from the same set of data. Then I've got this secondary failure mechanism that's about 10% of the observations that I see in the original data set. Rather than separating them out cleanly, I get data that looks like this, and I'm just going to change my side-by-side bar graph to a stacked bar graph. In essence, what I'm looking at is a slightly fatter distribution, a little bit fatter than what I would expect to see from a Weibull. Let's say I didn't know that by just looking at the data and I go into the Life Distribution platform. As it turns out, I can fit a Weibull, a single Weibull of this data. But looking at my comparison distribution plot, there's pretty severe model misfit. Even though I had convergence, I've got some pretty bad model misfit. Here's a case where, let's say I have enough process understanding to know that I've got multiple failure mechanisms. What I can do is I can go in there, into the Life Distribution platform, and I could say, get a mixture. Here I'm going to start with two Weibulls. I can specify whether or not they're on top of one another, whether they're slightly separated or whether They're completely separated. I'll just assume that they're right on top of one another, and it fits a reasonably good model. As a matter of fact, let me do one other thing, too. I'm going to change this axis so that it is on the Weibulls scale. There we go. You could do that with any graph within JMP. You'll see that this is a considerably better model fit. Matter of fact, if I go back up to the top, I've got my model comparison. I noticed that I do a much better job fitting that second Weibull distribution in my data. Like with the first method, there are things that will cause this approach to break, and I've created a data set to illustrate when you might start to worry and what you might be able to do. In this case, I have got a second distribution. Again, looks very much like that first distribution, and let me put them side by side, so we can compare them. The only difference is that, one, I have observations, just like in the outlier case where I pushed them further out in the tail. Secondly, not quite as obvious from the graphs, in that second example, I have considerably fewer observations in that upper tail, about one 100th number of observations. I think I used about 70,000 observations in that first simulated second data set, and only about, I think, a thousand observations, or maybe a little bit more than a thousand observations. So considerably smaller data set. If I were to go and fit that using Life Distribution, actually the fit looks better. However, I still have my non-convergence issues. Additionally, when I try to fit my mixtures, in this case, I fit my two Weibulls, I have non-convergence issues. I could certainly try different sets, different combinations of observations, of distributions. I'd be hunting and pecking in that case. Really, the problem here is that I've got these observations. I've got a small set of data. It's pushed way out on the tail. There's got to be better approaches to be able to model this. As it turns out, because this data is separated, if we take a look at the observations in this region of our space, there are no observations. It might be safe to think that these observations might come from one of my distributions and the rest of the observations The rest of the observations come from my other distribution. I can do that. I can model these two different distributions differently. It's pretty straightforward. All I do is I exclude and hide the part of the data set that I want to, and then I just run the like distribution, and I've actually preset that up. You see, as it turns out, in this case, and in this case, I have fit... Let me re-do that. I need to actually delete or hide and exclude all of my observations. There we go. Hide and exclude. Let's do that one more time and run that. One more time. Something is happening with my script, so we're going to do this manually. We're going to hide and exclude. We are going to go under my Life Distribution platform, and we're going to be putting in our Example 3 frequencies, our lower and our upper points, and we are going to fit the Weibull distribution. As it turns out, we see that this does, in fact, fit data when I don't include both sets of data. At this point, I would probably go back. I would probably flip the observations and refit that other part of the data. When there is clear separation between the sets of data, then I can just fit separate models for each. Let's go on to our third and final example. That is the data that Mehul was nice enough to share with me in terms of being legitimate semiconductor data that is real life, all the warps, warts, and bunts that data usually has. Now, if you look at this, this looks very different than what we've seen in the past. I not only have observations in the right tail, but I've got observations in the left tail as well. I don't have that clear separation that I in the other data sets. Now, granted, I can make arbitrary decisions in terms of let's fit Life Distribution to those observations, maybe a couple observations up here and then the middle of the observations. But that sometimes that's not really an ideal approach. I see right off the bat, I tried to set up the Nonlinear platform to fit a mixture model. You'll notice that there's quite a few of the intervals that I can't calculate the likelihood for. So very problematic set of data. Aside from trying to fit separate Life Distributions to parts of the data set, another option would be if I were to have access to a richer data set, more information about where this data came from, I might be able to diagnose where the problem was. Now, keep in mind that here we're looking at probably about two months worth of data, I believe. It comes from multiple lots, multiple wafers, hundreds of thousands of observations in this data set. It's likely to have multiple sources of failure mechanisms in here. The question is, are there certain parts of that data that are really causing the problem with this set of data? What I've done, and let me go to the data set right now. Again, I want to thank Mehul for sharing this information, and I've really only taken a subset of the data because of time constraints in terms of how long it takes to fit. But this is the original data. There's two different versions. I have four different lots in here. Each lot has anywhere from a few to about a dozen wafers. The question is, where are the problems coming from? I am going to use a platform that hopefully you're familiar with. If not, it's under the Analyze menu, under Reliability and Survival, Fit Parametric Survival. If you are familiar with Fit Model, it looks very similar to Fit Model. I can set it up like light distribution where I have a lower and upper when I have interval censoring. Different than fit model, I get to specify factors not only for my location, so factors that will influence where the center of the data is, but also factors that will influence how spread out the data is, the scale. I have set this up already. I'm just going to keep this very simple model and run Fit Parametric Survival. When I do that, you'll notice right off the bat, I have all of these missing values. That's because I've got non-convergence. What's nice about this particular platform is that I can use the data filter to try to diagnose that problem. What I'm going to do is I'm going to go into my data filter, and I've got this set up as a conditional data filter, and I click on the new, and you'll notice that with all of the new lots, I get estimates. It's only the old lots that are causing me the problem. Let me see if I can dig a little bit deeper. If I look at triple A Lots, that's a problem. Now I know that it's somewhere in this lot. I'm going to go to wafer one. That looks good. Wafer two, three. Here's one of my problems. Something is going on with wafer four. At this point, I could probably pull this out. I could probably take a look at it using all the tools that we've seen so far. I might have to go back into my data store and figure out, is there something else with the way this lot was processed that caused it to be different? But this is definitely a different lot. If I look through the rest of my wafers, they're all okay, except for this very last wafer. Wafer 15 is also an issue. But the whole beauty of this approach is that I've got a diagnostic tool I would be able to look more deeply into the data if I had that richer data source. Let's wrap things up. In summary, large scale reliability data is common in the semiconductor industry. It's a valuable tool for determining drifts, particularly in the case when I've got a high quality products, very low-ppm defectivity. As we've seen, distributions are very complex, often including multiple failure mechanisms and multiple distributions. Because of this, we need to rely on some slightly more novel fitting approaches to be able to not only fit models, fit reasonable models to this data, but to potentially diagnose problems that exist in the data. As always, and probably most importantly, subject matter expertise is incredibly important in terms of saying, historically and process knowledge tells us this is the best direction that we should head into. Thank you for your time, and I hope you got something out of this talk. Thanks.
Labels (2)
Wednesday, October 23, 2024
Executive Briefing Center 150
Life got you down? Do you have two failure modes and you're not sure how to make reliability predictions? There is a path to success! Using a straightforward method, an Arrhenius data set of transistor lifetimes with two independent, lognormal failure mechanisms are modeled in JMP. The upper confidence bound on the probability of failure at use conditions is also estimated.  But what about future testing? You may need to test similar parts in the next qualification. How should you design your life tests when there is more than one failure mode? Again, there is a solution!  A graphical method for planning life tests with two independent, lognormal failure mechanisms is demonstrated. Reliability estimates from simulated bimodal data are shown with the contour profiler, helping you navigate this difficulty. This simple graphical approach allows the practitioner to choose test conditions that have the best chance of meeting the desired reliability goal.     Hi, my name is Charlie Whitman. I'm the JMP Systems Engineer for the New York and New Jersey regions. For most of my career, I was a Reliability Engineer in the semiconductor industry. Today, I'm going to talk to you about a subject that is near and dear to my heart, and that is reliability. We're going to be talking about reliability prediction under two failure mechanisms. I want to start with where we're headed, where are we going? Back in college, I had a mechanical behavior professor, and before class one day, he showed us a specimen that had broken in test. He asked the class, "Why did this thing fail?" He answered his own question. He said, "Too much stress." That's what we're going to be talking about today. We're going to be talking about different types of stress and how they can cause failure. As we know, there can be multiple failure modes operating on our parts that can cause failure. Again, my background is in semiconductors. This is an example of some ICs on a circuit board, and they got too hot, and they got too hot, and they failed. Another example is here, corrosion. For example, maybe there was a little bit of water ingress into the package, and that caused some corrosion, and then the parts fail. Another possibility is ESD. Perhaps somebody touched this circuit board and they weren't grounded, and they caused an ESD event. But the bottom line is that there are multiple mechanisms out there that can cause failure. We need to have a way of dealing with that and modeling that so we can make predictions. How do we handle that? The literature is full of examples of how to model this. Typically, we are going to assume that these failure mechanisms operate independently. You can think of the mechanisms as all competing with one another to cause failure. When they act independently, that means they're not looking over their shoulder to see what the other mechanisms are doing and then making an adjustment. They just stay in their lane, and they do the best they can to cause failure first. How do we do the analysis then? We do the analysis... If we want to analyze the results from, say, one particular failure mode, if we assume independence, we can treat all the other failure times for other modes as censored. What does censored mean when I talk about censoring? Here's an event plot demonstrating what this means. Let's suppose I have three parts on test, and part number one is going along for, say, 10 hours, and after 10 hours, it fails. Then part number two is going along, and it goes along fine for a little bit longer and then at 15 hours, it fails. Then there's part number three. At part number three, we get up to 20 hours, and maybe we have to stop the test or something, but it doesn't fail. Now, it would have failed if we had kept going, but we didn't keep going. We had to stop the test. It turns out there is information in that. We don't want to just exclude this data point or pretend it failed at a different time. There is a little bit of math behind this, but JMP does all the heavy lifting for you. If you have a censored data point, you should use it, and it can help you make your estimates. In the talk I'm going to give today here, I'm going to assume that we have two active failure modes, and they're both independent, and they both have log-normal failure times. As a reliability engineer, we have to come up with a figure of merit, something we care about. When I was in the industry, a common figure of merit was the median time to failure. That's the time for 50% of your population to fail. I was never a big fan of using the median time to failure. Imagine if you have parts in the field and you have a 50% failure rate, that's pretty bad. You want to know the time to something much smaller, maybe time to 10% failure or the time to 1% failure. I think an even better metric than that is to use the probability of failure. I want to be able to make a claim like, "I think that the field failure rate is going to be a half percent in 5 years," something like that. Now, there's an old expression in statistics, and that is, "Nothing lies like an average." If I take an average of a sample, I have to understand there is some uncertainty in that estimate, and I acknowledge it using confidence bounds. The same is going to be true if I make a prediction of what the field failure rate is. If I say the field failure rate is 0.5%, 0.5% plus or minus what? That's where confidence bounds come into play. I'm going to use the upper confidence bound on the probability of failure as my metric because this way it's going to give me an estimate. It's a little bit more conservative, but it's more conservative in a rational way. I'm not making some arbitrary choice of what my upper bound is or what my metric is. I'm going to use the upper 95% confidence bound on the probability of failure. I'm going to give a little background here just so we're all on the same page. I want to define a few things. I'm going to talk about F, and F is the probability of failure, and R is its complement. R is 1 minus F, and that's just the reliability. I'm also going to be discussing two failure modes, A and B. These are arbitrary, so I'm just giving them a label, A and B. It doesn't matter what they are. Now, if I have both modes operating and both can cause failure, I'm going to have an overall probability of failure. I'm going to call that Ftot. That's for when both mechanisms are operating. And so naturally, I'm going to have an Rtot as well, 1 minus Ftot. The key to analyzing the data here is that we're going to use an approach that mimics that of two independent components in series. Let's suppose I have two components. I could have two circuits or two engines or whatever, and they're operating in series. What that means is if one fails, the entire system is said to fail. In that case, if I can assume independence, the overall reliability of my system is equal to the product of the individual reliabilities. Since I know that R is equal to 1 minus F, I can back out what the overall probability of failure is for both mechanisms, and that's given by this expression here. I'm going to make heavy use of the Arrhenius model here, and this is the Arrhenius model. Basically, the median time to failure is proportional to the exponent of inverse temperature. If you look at this, I can take the logs of both sides of that expression, and this looks suspiciously like a point-slope. Here is a constant, and here is the slope, and here is my Y. If I were to plot the log of the median time to failure versus 1 over kT, the slope of that line would be the activation energy, and the intercept term here would be logC. I'm going to be talking about an actual data set of actual lifetimes for something called a GaN FET. GaN stands for Gallium Nitride, and FET stands for Field Effect Transistor. Basically, there were some Field Effect Transistors or FETs on life test at high temperature, and I have some failure times for those, and I did an analysis for two failure modes. But I wanted to show a little bit about how this works. The way it works is I apply a voltage between the source and the drain. Then a current will flow between the source and the drain, and that current is controlled by the gate. I can apply a voltage to the gate, and I can either shut that current off or I can let it flow. As we know, nothing lasts forever. What this means is that I can apply a voltage between source and drain and a constant voltage on the gate, but over time, this current might degrade. It'll change, and I don't want it to change. I want it to be constant, but it starts to vary. Another possibility is that instead of current going from the source to the train where I want it to go, it goes from the source up to the gate. Now I have gate leakage, and that's something, again, I don't want. Another possibility is that the voltage which will shut off the current here, the voltage I need on the gate to do that, can become unstable, and it can vary. That means, again, my device won't be functioning properly. I have all these possible mechanisms which can be causing failure. I'm just going to be assuming that there are two, and I'm just going to give them the label here of A and B. Also, I want to talk a little bit about accelerated life testing. Supposedly, we have products that are in the field, and we'd like those products to last a long time. We'd like them to last years. But if you want to prove in the reliability for a new product, you can't afford to wait years and years to test. You have to get the testing done much more quickly than that. That's why we use accelerated testing. The idea is that we're going to up the stress. We increase the stress on the part, we run it under some high stress condition, and that makes the clock run faster. That makes it so that the failure mechanisms aren't changing. We're not introducing anything new. Everything just happens more quickly. There are various stressors that we can use. We can use temperature. Back when I was in semiconductors, we used voltage very often. You can change the environment. I can make it a humid environment or a dry environment. We're working to focus on the temperature. I want to show you here an Arrhenius plot. This is the failure time. This is on a log scale, and I plotted my failure times versus 1 over kT. Remember, this is inverse temperature. Over here on the left side of the plot is actually high temperature, and on the right side of the plot is low temperature. I run my life tests at different high temperatures, and I have my failure times, and I can fit a line to that. The slope of this line is my activation energy. I'm going to take this data, and then I'm going to have to do something terrible. I'm going to have to extrapolate. We've all been told we shouldn't extrapolate. But in accelerated testing, there's no choice. We have to extrapolate because look at these failure times. It could be on the order of a million hours or something, and we can't wait that long. We do this extrapolation, and when we have the extrapolation, we can then get, say, a median time to failure, or also we can know the distribution of failure times at the use condition. When we know that, we can calculate things like time to 1% failure or the probability of failure or what have you. I want to give a little heads-up and show you where we're headed. Here are some case study results. I have this data which I obtained from a customer, and I analyzed the two modes, mode A and mode B. Here I'm just looking at the profiler. Here I have the distribution profiler for mode A. Again, I found this by treating all the mode B failure times as censored. That way I'm analyzing only mode A. In the Profiler, I have my two factors. I have my temperature of the device and the time of interest. Let's say there's an industry standard, and the industry standard is I need to know the reliability at 150 degrees C. Also, we want to know how long it's going to last after 5 years, or what the probability of failure is after 5 years. 43,830 hours is 5 years. Using the Profiler, I see that the probability of failure is about 5 times 10 to minus 22, and the upper bound there is about 0.001. Then I can do the same thing for mode B. Here, mode B, I'm going to treat all of the mode A failures as censored, and I have only mode B failures. When I do that, again, under the same conditions, my probability of failure is very small, 4 times 10 minus 89, and my upper bound is about 1 times 10 minus 17. Compared to mode A, mode A had an upper bound more like 1 times 10 minus 3. Mode B looks like it's very, very small, but probability of failure due to mode B is much, much smaller than mode A. It looks like mode A is really dominating at lower temperature or at use conditions. I'm going to show some more of this in a little bit. Then after I talk about this case study, I'm going to change gears a little bit. I'm going to talk about how we plan a life test. Let's suppose you have a part that you're going to test, but you know that you have more than one failure mode, and you want to plan around that. What temperatures do I use? And what happens if there's a big difference in activation areas between these modes and things like that? How many temperatures do I need? Well, I boil it down to contour plot. In the contour plot here, I have all these input factors. These are all my planning values. I have a... I have z_A and z_B. These are Z Scores for my two different failure modes, and I'll show how to calculate that later. I have other planning values like, what is my... How many parts per temperature do I have? How many temperatures do I use? Things like that. What I get out of it is this contour plot, and this is basically a response surface. I have my two factors, z_A and z_B, and as they increase, the upper confidence bound on the probability of failure goes up, up, up. So everywhere on this contour, here, all those values of z_A and z_B, the upper confidence bound is 10 to the minus 2. Here, for all these values of z_A and z_B, again, the upper confidence bound is now 10 to the minus 3, et cetera. We're going to be using this contour plot in a little bit, and I'll show you some more. Let's get into the case study. Again, I obtained this data from a customer. There were 48 GaN FETs tested over three temperatures, 320, 337, and 355 degrees C. The test was run for a good long time, but after about 2,100 hours, the test was stopped. This is just a summary table showing what was done. Here the DUT is the Device Under Test. These are the temperatures that it was run at. Then I had different part numbers or different numbers of parts at each temperature. Notice that the distribution is not equal, I have fewer parts at the highest temperature compared to the lower temperatures. That's actually a good idea because remember, what we're going to do is we're going to extrapolate. We're going to extrapolate to lower test conditions. When we extrapolate, what we want to do is we want to make sure that those confidence bounds on whatever estimate that we come up with is as narrow as possible. It turns out that if you pile up your parts and put more at the lower temperatures, that confidence bound is going to be a little bit more narrow, and that's good. That's why it was done this way. Then looking at the failures here, we see that we had a lot more mode A failures than we had mode B failures. But clearly, we're getting more mode B failures as the temperature increases. Maybe the trend is not quite so strong with mode A, but definitely for mode B, we can see we're getting more failures than for mode B at higher temperature. What I did was I took the life data and I did an analysis and I got out the activation energies, and I was able to create this Arrhenius plot. Again, I have my failure time on the log scale versus 1 over kT. The slope of this red line here is the activation energy for mode B, and the blue line is the activation energy for mode A. If we look at this, you can think about extending these curves out. If I had for mode B, if I extended this out to higher and higher temperature, you see the failure times here for mode A, we'd expect them to be longer than that for mode B. What that means is that mode B would dominate at higher temperature, and that's what we see. We're seeing mode B get stronger and stronger as the temperature goes up. By the same token, if I tried to extrapolate here for mode B to lower temperature, the failure times here for mode B would be very high, much higher than we would get for mode A. The parts would fail for mode A first, and mode A would dominate at lower temperature. Again, that's what we're observing. I took the life data... This is just an example. This is what the data looks like. For example, I had a part here after it was run at 320 degrees C. It failed after about 261 hours, and it failed due to mode A. Also, I have a column here, a censoring column, which tells JMP, is this failed or is this censored? Again, JMP is going to do the heavy lifting and do the analysis for us behind the scenes. All we have to do is tell it which has failed and which is censored. I checked the data. I used the Fit Life by X platform to make predictions, and I did this one at a time. I have data for mode A only, and again, all the mode B failure times were censored. When I did that, I got my failure times here. I have my distribution at the three different temperatures. JMP produces this. You can see that the spacing here between, say, 190 and 170, that's 20 degrees, is different than the spacing here between, say, 290 and 270. That's because we're doing a transformation. This is actually transformed from 1 over kT. That means the slope of this line is still my activation energy, and I can do my extrapolation to use conditions. My use condition is about, say, 150 degrees C. When I do this, I see that my median time to failure is maybe a little bit over 1 times 10 to 11th hours. Also, I see that I have a distribution. I know the distribution of failure times here, so I can calculate things like time to 1% failure or what's the probability of failure in 5 years or something like that. Once again, we have our distribution profiler, and so I can put in different values if I want to. I can move these things around to see what the probability of failure would be under different test conditions. Also, I'm getting my Arrhenius parameters here. JMP calls beta-naught the intercept term. That was my log C in my presentation. Here, the intercept term is right around -40. JMP calls the activation energy beta 1, and that's about 2.4 EV, and then there's the shape factor, and the shape factor here is about 1.6. Basically, from these Arrhenius parameters, you can predict the log mean, and the log mean here is a function of temperature T. Then when you know the shape factor, you can completely describe this distribution, that's why we're able to predict what the probability of failure is at these conditions or any temperature we care about. Now I can also do the same thing for mode B. I analyze the data for mode B. Here, again, I treated all the mode A failures as censored. Now, if you remember, I had far fewer failures due to mode B. You can see that from this key here, the triangle means that it is censored, and I have a lot of censored data points here. What I can do here is, again, I have my failures, even though If I were to get this censored, I can still do the extrapolation, and I can do that to use conditions. The median time of failure here is more like, say, 10 to the 25 hours. That's much, much higher than for mode A. Again, we would expect mode A to dominate at lower temperature. Then we also have our distribution profiler. This also tells us something. We're analyzing these two failure modes separately. What that also means is that if mode A were completely eliminated. I changed my design, I changed how I process things, and I'm able to eliminate mode A, all I'm left with is mode B, and mode B would produce much more... If only mode B were active, the parts would be much more reliable. Now I can play games. I can say, maybe I have a spec, and my spec is I want to make sure that the upper bound is no worse than 10 to the minus 3 in 5 years. Right now it's 10 to the minus 17, so I can use the profiler to say, can I survive at 200 degrees? Well, now the upper bound is four times 10 to the minus eight. I can go up a little higher, maybe 225. Now I'm a little higher, it's more like three times 10 to minus five. I can go even a little higher if I want to, 240. Now we're getting something close to our spec of 0.001. This is actually good news. It helps us point us in a direction for where we need to go for our reliability program and where we're going to get the most bang for our buck. This also helps us open up the spec. Maybe the customer would like to operate the part at a higher temperature. So now we can say, "Yeah, sure, go ahead. You don't have to keep it at 150. You can operate much higher if you want to, and you're still okay." Also, JMP return my Arrhenius parameters as well. I have my intercept term was very different, minus 100 or so. The activation energy here, you can see, it's close to 6 ED, and it was about 2.4 ED, I think, for Mode A. That's very different. That's why the slope, again, here, the slope in my Arrhenius plot for mode B was much steeper. Also, I get the shape factor. The shape factor was larger. Again, with these parameters here, I can predict what the probability of failure is at time and temperature of interest. Let's just summarize what we went over here. This is a summary table. I analyze the data for mode A and mode B, and I got upper confidence bounds on those. It turns out that JMP does not return an upper confidence bound on the overall probability, which is what I was after. It uses a walled technique to get the upper confidence bound for each mode. But unfortunately, it's difficult to apply the walled technique when I have two modes operating, so I used a different method, and when I did that, I got a slightly different answer. It's about the same as we would expect. Since mode A is dominated, we would expect that if both modes are operating, it's still going to be mostly mode A, and we would expect upper bound to be something around one times 10 to minus 3, and we're getting something around that ballpark, so it looks like this is working correctly. Let's move on, and we're going to talk about planning a life test. We're going to change gears here. For many years, there's a lot of this stuff in literature and in textbooks on how to plan a life test for the Arrhenius case, where I'm testing over temperature. But less has been published when there is more than one more present. Of those publications, I think most of them are highly mathematical. It's fine as far as it goes, but I like a graphical approach. I think a graphical approach is more intuitive. What I did was I simulated a whole bunch of failure times under different test conditions, and I took those results, and then I calculated the upper confidence bound on the probability of failure, and then I modeled that, so I could predict what the upper confidence bound would be for given test conditions or different assumptions for planning values, number of temperatures, things like that. I want to give a little background here, so you all understand what we're talking about. Again, I'm going to assume I have two failure modes, A and B. These are generic, so I'm just calling them A and B. Each mode is going to have its own Z-score, z_A and z_B. I'll show you how to calculate the Z-score in a second. I did this to simplify my life. I didn't want to have so many free parameters like activation energies and shape factors and all these things. I found that if I could boil everything down to a Z-score, I didn't have quite so many things to vary, but I could still back out what the activation energy was, et cetera. That would be very helpful if I didn't have to simulate so many different values. Also, there were other inputs as well, things like number of parts per temperature, number of stress temperatures, and things like that. Here's how we calculate Z-score. Typically, we have some an industry standard. We're interested in the time to failure after 5 years or 10 years or something like that. We also have an industry standard maybe for the use conditions. We want to be able to tell the customer that they can use it at 150 degrees C or 100 degrees C, something like that. These are known. Also, we're going to have to input some planning values. We're going to have to guess, we're going to have to talk to subject-matter experts, something like that, to get estimates or an idea what our Arrhenius parameters are. Then we just plug those values into our formula, and we're going to get a Z-value, and that's going to be our Z-score. Since we have two failure modes and they each have their own Arrhenius parameters, I'm going to have two Z-scores, one for mode A and one for mode B. This is just a summary table showing I varied all these parameters over a very wide range. I varied z_A and z_B over a wide range. I used a wide range for the use temperature, the number of temperatures, et cetera. I also wanted to boil things down. I did not want to have a huge matrix of lots of different possible stress temperatures. I made my life easy by boiling it down to just three metrics. For example, if I tell you what the lowest stress temperature was, T1, and I tell you that I had four temperatures, and I tell you what the spacing between those temperatures was, then you know everything. Then you can calculate what the four stress temperatures were. Rather than having a whole huge matrix of all these possibly different stress temperatures, I brought it down to just three parameters: the lowest stress temperature, the number of temperatures and the spacing between those temperatures. Again, I would vary the sample size here, I vary the activation energy and shape factors, et cetera. What I did was, of all those possible test conditions, I picked 1,900 unique planning values, and for each one of those 1,900 values, I generated 500 data tables. When I generated, I'd create randomly generated log normal failure times. I'd have one column for mode A failures and another column for mode B failures, and then I would take the minimum. That's what competing risks means. There are two different failure modes are competing to kill the part, whichever one has the lower failure time wins, and that's the failure time for that part. Since I had 500 data tables, and for each one of those, I could calculate what F-tot was, because I know FA and I know FB, so I can put that into my formula and calculate the overall probability of failure. Since I had 500 of them, I have an idea what the distribution of F-tot was for each test condition. I could use the quantiles. I could take the 97.5 quantile of the 500 values and use that as my upper confidence bound. Now I would have 1,900 different upper confidence bounds for each one of those test conditions. Then I could take all that data and feed it into a neural network and make a prediction. I could see what the effect of my test conditions or assumptions for test planning values, what impact they had on the upper confidence bound. One question was, after I did this, which factors were most important? I used the predictor screening platform for that. I'll show that here. There we go. I have my predictors z_A and z_B, and I see that about 80% of the time, z_A and z_B showed up as being important in predicting the upper confidence bound. Then also for N, the number of parts per temperature, that showed up about, say, 15% of the time. Between those three, those three showed up 95% of the time. Now, there were these other factors, too, and they did not show up as often. The percentages here are much smaller. But I included them in my model anyway, because I found out that varying them, they could actually have an impact on the contours smaller than these, but they did have an impact, so I wanted to be able to take that into account. Also, censoring time, I ended up not using that at all. That's because when I did the simulations, I assumed that the practitioner would choose temperatures and times where they would get some reasonable amount of failures. They don't want to choose some temperature that's way, way too low or some censoring time that's way, way too short, because then you'd have 100% censored data, and you can't do much with that. Since I chose censoring times, which were modest or moderate, as expected, the censoring time did not really play much of a role, and so I did not use that in my analysis. I did a neural network fit, and if you are familiar with neural networks, you know they are prone to overfitting. What I did was I used the technique that is typically used in this situation to alleviate that, and that is the training, validation, and test approach. What does that mean? What I did was I randomly picked 50% of my data, of those 1,900 rows, 50% went into the training subset, 25% went into validation, and then 25% went into test. What I would do is, I would actually JMP do this internally, it'll fit a neural network using this training data, and then it'll go test it and see how well does it predict validation data which was not part of the training data. The model has not seen that data, and then it would keep making adjustments to the parameters here for the training data until it did a better and better job validation. Also, there's an internal a way of making sure you don't overfit or underfit. You want to fit just right so that the algorithm will stop if you start to overfit the data. Finally, that final model, you take that, and you test it versus the acid test, the test group. The model has never seen that data at all, and you want to make sure it does a good job fitting there, too, because then you can trust the model. I did that. Here's for the training data. I had a nice high R-Squared, 0.99, which is fine, but really the question is, how well does it do on data that it hasn't seen? In this case, I found that for the validation data set, I got an R-Squared I got about 98%, and even for the test, I got an R-Squared of 96%, so that's really doing pretty well. I would have been concerned here if one of these two values was very, very low, it's much lower than 99%, because that would mean that I was overfitting the data. That's not what happened. I got a very good fit, so I'm happy with this, and I'm going to go ahead and use it to make predictions. Let's go ahead and see how we would create a test plan with this. Let's assume that we have some corporate standard, and our corporate goal is we want to make sure that the upper confidence bound on the probability of failure is no worse than 1 times 10 to minus 4 after 5 years at 125 C. We want to make sure that we pick the right temperatures, the right number of temperatures, that we use the right sample size, et cetera, to make sure that at the end of the day, we have a pretty good likelihood of hitting that target. We're going to use historical values for the Arrhenius parameters to help us out. How do we do this? Well, again, maybe you have previous experience, and we have tested parts before, and we know, for example, that mode A dominates at low temperature, and we have an idea what the activation energy is and what the shape factor is. Since we have tested here before, we can back out, and we know what that intercept term is as well, and then we just plug and chug. We know the time of interest is 5 years. We know the temperature of interest is 125 degrees C, and we know our Arrhenius parameters, or at least we're willing to guess what the Arrhenius parameters are, and we plug and chug, and we put it in. In this case, we see that z_A is minus seven. We're going to do the same thing or similar thing for mode B, not exactly the same. We have an idea, we say what the activation energy is and what the shape factor is. But to make my life easier, again, I introduced a constraint. If you're life testing here, and you have both modes present, I'm going to assume that at your middle temperature, the median time of failure for each mode was the same. That's because if we were testing several temperatures, say, and mode B was not present or only present at the last temperature, I would probably just throw that data out and just use the mode A failure alone. But here I'm going to assume that the mode A and mode B have the same median time of failure. Also, it makes a certain amount of sense. If we were going along and mode B was dominating, mode B was dominating, We would not expect there to be some sudden shift in the median time of failure, going from mode B to mode A at some temperature. It's probably not going to be a cliff. It's probably going to be some sort of smooth transition. If I assume I know what's going on here with mode A, and I know the activation energy for mode B, or I assume it, if this is fixed, then I know I can extrapolate this to just get the intercept term where it intercepts the Y axis. So given that constraint, I can calculate what the intercept term is for mode B. I'm going to do the same thing. I'm going to plug and chug and put the values I have in for z_B, and then I get out about minus 6. Let's see what that looks like. I can generate my contour. Here we go. So let's put our values in here. Z_A was minus 7 and z_B was minus 7. So I got the crosshairs here. And my goal was to make this 10 to minus four. I am shy of that goal. It's telling me it's minus three point something, and so that's why the crosshairs are something between the minus three and minus four contours. We're not quite there. That means we're going to have to make an adjustment. Maybe we were wrong in our values or choices are for z_A. Maybe we can adjust those and pull this down, so it's going to cross the minus four contour. Or maybe we can just increase the sample size. Right now, my sample size is 20. I can increase it to 40. Look, now I'm past this minus 4 contour, minus 4.1. So if I use a sample size of 40, I can meet the goal. Let me put this back for a second. There's two other things we can get from this contour plot. One is we can use this as a rough guide to the sensitivity. We want to know what happens if z_A changes or say my activation energy changes, my shape factor changes, et cetera. Right now I'm at minus 7, and the contour is minus 3.1. If I put this at, say, minus 5, then they get about minus 2.1. If z_A changes by 2, decreases or increases by 2, my upper confidence bound changes by an order to meta 2. That might be good to know. The other thing I can do here is… I'm applying this to two failure modes, but I can also use this graph for a single failure mode. To do that, I'm going to assume, say, for example, let's say z_B doesn't really happen. The probability of failure due to z_B is really, really small. That means that z_B, or the probability of failure due to mode B is very small. I can use a value, say, a z_B of minus 10. I can just put in minus 10 here. Now we can put in a different value, say, go back to minus seven. We see now I can just pay attention to this axis and look to see where the contours cross this axis alone. I have some freedom. Maybe I can decrease my sample size or something like that. I can go down to 10 parts rather than using 20 parts because my goal is to hit that minus 4, and I'm right around the minus four mark, if only mode A is active. It's just another way you can use these plots. Yes, and I just went over that. Let me wrap up. We covered a lot of ground today. I analyzed the life test data set for a GaN FETs, and we use the competing risk method to find the probability of failure for the two different modes. I also showed what could happen if we could eliminate the one failure mode. For example, if we eliminate mode A, and we're left only with mode B, the reliability would greatly improve, and it would allow us to, say, increase our spec. I also went over how to plan a life test, an accelerated life test, when you have two failure modes present. I did that using a contour plot, which was the result of a fit from a neural network, and it looks like the fit was really pretty good. I showed how to use that with contour plots. I also showed that those contour plots could be used for a simple sensitivity analysis. Let's see what the effect of during our planning values has on the upper confidence bound. I showed that we could use that same plot for single modes. They don't have to use it only for two mode failures. That's it. Thank you very much.
Labels (2)
Labels (3)