Measurement System Analysis for Non-normal Data

Measurement system analysis (MSA) is very important in the semiconductor industry to estimate the quality of the measurements. Most MSA indicators, especially the precision to tolerance (P/T) ratio, implicitly assume a normal distribution, with +/- kσ covering a given percentage of the distribution. In the reference documents (AIAG MSA Manual), there are no alternative calculations for non-normal data, and it is difficult to find a simple method that adapts to parameters with very different distributions.

We present two methods, with simple calculations and that are distribution agnostic, that cover the percentage of distribution set for our confidence level. The first method uses the Bienaymé-Tchebychev inequality to properly define the number of standard deviations in a k-sigma type formula. The second method uses a calculation of half-standard deviation on the right and on the left to allow for better coverage in the case of an asymmetric distribution.

The two methods are applied on many electrical tests with JMP formulas and can generalize to outlier detection and removal.

Hello everyone. My name is François Berland and I work for NXP Semiconductors in Toulouse, France.

François Bergeret ippon learning in France. We are consulting in statistics company.

Today we're going to talk about measurement system analysis in the case of non-normal data. The agenda for this presentation is the following. First we're going to talk about the business context. I'm going to talk about the automotive radar and what validation means in the world of semiconductor. Then I'm going to refresh everyone on measurement system analysis. Then I'm going to share some cases of non-normal distributions for which we have two solutions.

Solution one is based on Bienayme Tchebychev inequality. I'm going to give some recommendations for validation under that context. The solution two is based on the Half Standard Deviation. We're going to share calculations, applications, JMP formulae, and we're going to propose a generalization of this method for outlier detection.

Let's talk about radar in automotive applications. Radar stands for radio detection and ranging. It's a key technology enabler for autonomous vehicles and for enhanced safety. Radar in the cars provides the vehicle position, velocity, angle of multiple targets in relation to other vehicles on the road. It has several advantages. It is inexpensive. It is robust. It can detect the presence of obscured objects, and it is not affected by weather and lighting conditions compared to cameras.

You can see on the picture. Some of the applications for short range medium range radar. It can assist with park assist, cross-Traffic alert. For medium range radar, it can help with blind side detection, and for long range radar it helps on the function of adaptive cruise control and automatic emergency braking as example.

Now validation in semiconductor development. In semiconductor automotive, we use the V-model development. V-model stands for verification and validation. It is a way to develop new products. The left arm of this V consists in defining the requirements and analyzing the requirements. It is basically specifying the product to come. Then comes the implementation. Then the second arm of the V is here to show that we need to evaluate whether the product meets the specification requirements. In other words, we are doing some testing on the products.

In automotive, validation refers to on silicon testing. What is the role of validation? Well validation is here to verify that the product conformance to the specified requirement is here. Validation typically includes on-silicon tests and measurements. Then let's talk about MSA measurement system analysis. Why is a measurement system analysis important.

Let's consider this case where we have a parameter with a lower spec limit and an upper spec limit and the validation engineer measures a value that is, say, here we test a part, and we get a result where the cross is on this graph. The cross is within the specification limits. We conclude that the test result is in spec in specification. But then let's assume this distance. This range is the accuracy of the measurement system you are using.

Well if you have measured a value here due to the measurement uncertainty, the true value of the product could be actually outside of the spec limit. Even if the test result indicates in spec due to the measurement accuracy, the true value could be in fact out of spec. This is why the MSA and assessing the measurement system accuracy or error is so important.

There are two cases. A part can be really out of spec, but due to measurement error we actually measure it inside the spec. On the other side the true value of the part could be within the spec, but it could be measured outside of the spec due to the measurement accuracy.

First case corresponds to a bad part that would be called good, which represents customer risk or quality issues. Second case is a good part called bad, and it's a producer risk and expected yield loss.

The MSA is largely based on the Gage R&R, which allows to estimate the variation of a measurement system. Gage R&R stands for repeatability, which is the variation in the measurements taken by a single instrument person, and the reproducibility, which is the variation in the measurements across different instruments persons.

The main metric for the measurement system analysis is the percentage P over T for precision over tolerance, and is expected to be less than 10%. We can visualize the measurement system accuracy, such as the total variance of the measurement, which is the sum of the variance of the reproducibility plus the variance of the repeatability.

The formula of the metric P over T is defined as six sigma of the measurement obtained through the Gage R&R Experimental Study, divided by the spec width in percentage. Six Sigma is here because it's actually plus or minus three sigmas, and for a normal distribution corresponds to a confidence level of 99.73%. But this is under the normal assumption.

Here are some real cases of validation measurements performed in validation. Validation is not production. We have access to a few parts only, not thousands of parts of products. We do not perform a full Gage R&R study such as they are defined in the automotive industry specifications for measurement. We usually have only one part. We do about 30 times repetitions, and we do observe cases of non-normal distributions, as shown in those two examples where I plotted the normal quantile plot, and you can see the goodness of fit test both for Shapiro-Wilk and Anderson-Darling, which is very significantly significant for both parameters.

Here we are facing two cases of parameters that do not follow a normal distribution. But still we want to estimate the Gage R&R and the P over T. The question is the standard formula of six times the sigma of the measurement divided by the specification tolerance? This reference formula, is it still valid in the case of a non-normal data set? For that we propose here two alternatives two solutions. First one is based on the Bienayme-Tchebychev Inequality.

Let X be a random variable with a finite non-zero variance. Then, for any real number k greater than zero, the probability that the variable minus the mean that, in other words, that the variable is far away from the mean by greater than K times the sigma is less than or equal one over K to the square.

The amazing thing about this inequality is that it is distribution agnostic. We don't have to deal, we don't have to worry about the normality of the distribution.

Let's see what it gives in our cases. K sigma for 3K3 sigma. The max percentage beyond K standard deviations from the mean. Three standard deviations from the mean. Per the Bienayme-Tchebychev Inequality is 11%, beyond, let's say four sigma away from the mean. It can never be bigger than 6.25%. For five sigma it's 4% and so on. Until this case of ten sigma, which guarantees no more than 1% of the population will exceed, will be beyond ten sigma.

But let's remind ourselves that the standard P over T formula, which is based on six sigma plus minus three sigma based on a normal distribution, guarantees the confidence level of 90%, regardless of the distribution type in this case. What I want to say is that if we apply the Beinayme-TchebyChev Inequality with three sigma, we actually can cover a confidence level of 90% regardless of the distribution type. Which is a good thing because we don't we do not need such a high, confidence compared to production measurement systems. We are still in validation.

Yeah, we have validation. We are development, we are not production. My recommendation would be to use K equals four in our formula to reach a confidence level based on the Beinayme-Tchebychev Inequality close to 95%. Again this is regardless of the type of the distribution. It lifts the need to verify the normality of the data set, and in case of a non-normal data set, we can still reach assess the measurement system accuracy with a confidence level of 95%. Thanks to this inequality.

However, the inequality generally gives a poor bound since the distribution is unknown. Let us remind that in the case of the normal distribution, plus minus three sigma equals to 99.73%, which is much higher than what we are aiming here. If a higher confidence level is required, we have a second solution. I will let Francois explain that to you.

Thank you Berland. Thank you, François.

We will try to have something a little bit more precise compared to the Beinayme-Tchebychev, even if the Beinayme-TchebyChev Inequality is universal. It's a very nice result we can apply on electrical test data. The idea of what we call Half Moment. Half Moment is a measure of the standard dispersion of the data. But just on one side. On the right side or on the left side of the distribution. By definition with the Half Moment you can deal with distributions that are non-symmetric. Very often of course non-normal data means non-symmetric data. There is a right and left Half Moment.

About the theory. I had a look at the theory. It's quite complex using the complex number i. Then I will not enter here in this theory of course. But there is an alternative about Half Moment which is called the Half Standard Deviation. Half Standard Deviation is very useful for the objective of NXP and Francois. What is our moment? Here you have the very simple formula of the Half Moment for the left sigma.

Okay, so you may see that it is very similar to the formula of a standard deviation. But on the left you have a given number of point on the left. On the left means lower than the mean, lower than the mean. We calculate the sum of square between the point on the left and the square difference with the mean. Note that y bar is a global mean. It's not a left or right mean, it's the global mean of all the data. This is quite natural. The left sigma and right sigma is the same of course on the right. Okay.

Once you have this calculation done you can model and you can calculate a confidence bound confidence interval for any non-symmetric distribution. Here we have an additional assumption when we use this prediction formula interval. You see the Z are the quantile of the normal distribution. It may seem to be surprising, but here we assume we have a non-symmetric distribution. However, on the right and off the left we have some kind of skew, like a normal distribution. Okay. For that reason, we can apply this formula, which is a standard formula with the Z of the normal distribution. But of course the prediction interval will not be symmetric because usually for non-normal data by definition sigma left and sigma right is not the same. That's the that's the idea.

Now I go to the next slide. An example here we have a non-normal distribution. This is an electrical test data. The quantile plot is showing non-normality. We also see this in the histogram. We have a calculation at plus or minus three sigma. But we have sigma left and sigma right. What we see in this distribution.

The average is close to this value here. The skew of the distribution is mainly on the right. In that case sigma right is higher than sigma left. You see that the prediction. This is a prediction interval at equivalent 99.7% which is wider on the right compared to the left.

What is also interesting is that an this value is very close to the prediction limit, because the sigma left is low. We can see that there is a very small skewed distribution left, and this point is quite far. Compared to this. We are able to model this point to some extent.

I'm going to show you with a JMP now, just to continue and to generalize the P over T ratio defined by François, we define a non-normal P over T ratio which is just three sigma left plus three sigma right divided by the tolerance. This is clearly the same formula. But we have we don't have six sigma in the numerator. We have three left and three right sigma.

In the illustration here in that case the skew of the distribution on the left is very important. So sigma left is clearly higher than sigma right. You see that the prediction limit here we call this prediction limit LPL is clearly on the left compared to the upper limit. With this, we cover roughly more than 99% of the distribution, regardless of the normality or any distribution, it's working on any distribution. Like the Beinayme-Tchebychev this is universal.

Next slide. An example. A little additional let's say challenge here. Very often when you have a non-symmetric distribution, you have only one spec limit. For example, we can take the example of leakage current. There is just a higher spec limit because the danger is on the high value for leakage. On the other end or on this example, we have only a lower spec limit. In that case, of course, we shall not invent another limit. We have one spec limit, then we work with one spec limit. In that case, there is an adaptation of the P over T formula, which is here we just have on the numerator three left sigma. When we say sigma in this presentation it's always measurement sigma. Three left sigma divided by equivalent of half tolerance okay. The mean minus the lower limit. This is a standard formula. But what is different here is that we apply the standard formula with sigma f and not sigma.

This example is very interesting. Once again on real data provided by NXP okay what we see. We have a Non-symmetric distribution clearly. What we see is that sigma left is higher. We have a skew density of probability which is clearly higher on the left. It's not surprising that three Sigma left. We call this lower prediction limit is really on the left compared to upper prediction limit.

Here if we look at the non-normal P over T it is just the ratio between the mean here and the lower prediction limit. This range is compared divided by the difference between the mean and the lower specification limit. This is just the ratio. In this example okay we have a 75% P over T. It means that the global dispersion of the measurement system on the left covers more than 50% of the tolerance of the left tolerance. Of course, it is a bad result because we need to have this lower than 10% in the automotive industry. This is a very good example of what we can do.

Just for information, if we don't use the non-normal P over T, if you use a standard formula, the result we can calculate with JMP, I will show you will be 43%. 43% is the result. Clearly different, compared to 75%.

Now I will do some demo with JMP. Sorry. Thank you. Little demo with JMP. Here it is. I'm going to open two things. I'm going to open a file which is anonymized. What you see, X1, X2, X3 and so on are the result of electrical test data. Current leakage and so on. We will see that most of them are non-normal data. First of all we can have a look at some of these parameters. Here it is. The first one for example is clearly a non-normal. We can do a lot of things with JMP of course.

For example, we can try to model. We have a very wide collection of distribution. We see it's not normal. Why don't we try the Cauchy distribution? but the Cauchy distribution is not working so well in that case. Let's try. What do you like? We can try exponential distribution. Exponential distribution. Okay. The blue one is clearly not performing well at all. It's a very painful walk to try to fit a theoretical distribution to this empirical data.

You know what? In addition, even if I found a nice distribution fitting the parameter X1, clearly it will not be the same for X3, X4. For example x4. Let's have a try with a Cauchy. It's not working well at all. Even if it's not so bad okay, we can also try something else. We can try the beta for example. The beta will not work well. Okay. In any case it's a very hard work to do that. Second option that we can do to model the distribution and to calculate quantiles that could be equivalent of six sigma in a P over T formula is to use what we call a fit smooth curve.

Smooth curve is a non-parametric modeling of the data that may be working very well in some cases, but here, in the case of validation, we have two problems. The first problem is that we have very small samples. For example in that case they have done 15 repetitions, only 15 small sample size. We know that non-parametric estimation, especially for distribution estimation, needs big samples. Clearly in the context of validation it's not enough. Smooth curve will not work very well. In addition with this small sample the smoothing is very sensitive to these outliers and these outliers. We don't want to remove them. They are part of the measurement analysis error. We cannot remove them.

To conclude using the standard method, fitting a non-distribution is very difficult and using nonparametric fitting is not well adapted. We need another solution. R Sigma is a simple solution because NXP was looking for a simple solution with not a lot of complex statistical calculation.

Let's let's look at this. I'm going to open a little GSL script that was developed here at ippon. I have to find a GSL elitism. I run the GSL, it ask me to open a file. Here I want to assess the measurement system analysis for all this parameter, all this electrical test. By default, I take a value of three? Three plus three is six, of course. This is a default value.

What are we observing. This first distribution is not so far from normality. We see this in the quantile plot. We have all the distribution. This is X15. If I go back to X1 clearly X1 is not normal we see this. X2 is not normal. We are clearly a right skewed distribution. X4 is a discrete distribution like a plus one distribution. Not really normal to, maybe X9 could be, close to normality. X10 is not clearly normal. It's clearly not normal and so on.

We have a set of parameters electrical test and the distribution. Now we are going to enter spec limit. Let's start with X1. For example for X1. We know that the spec limits are 25 and 28. Here I enter 25 and 28. We have two spec limits and here the P over T ratio is 39%. Not so good. Not catastrophic but not so good.

We have an illustration here of the P over T ratio which is just the ratio of the spread of the measurement variation. Here sigma is only a measurement sigma okay. Which is compared to what is expecting the customer. The tolerancing interval here USL minus LSL. The blue compared to the red is 39%. Not so good. An improvement plan will be needed to improve the measurement system here. But this is another topic.

Let's have a look now at X2 non-normal with a right skew of the distribution. We are going to enter a just for X2. The danger is on the right side and there is only one spec limit on the right. Let's enter the right spec limit which is point two. Point two is here. You compare here the interval between the mean and the blue line, and between the mean and the red line. Here it is 34%. It's not so good. But I'm sorry I made a mistake here. The spec is 0.5. Sorry for that.

Here we have a better illustration. This is the width of the measurement system variation compared to the spec. We have a correct let's say correct p of t it's 12%. In any case you can just entering a spec limit. I'm entering another one. Let's find. Yes, this one on X10. We are going to enter spec limit which is as far as I remember X10. We have a spec limit at 30. Let me find X10. Lower spec limit lower at ten. Sorry. I made a mistake here. We have a very good, very good. I think it was ten. Sorry about that. Anyway, it doesn't matter. With the spec limit at ten, we have a very good P over T ratio.

Anyway, it's very easy with this calculation. We did a GSL script because GSL is very nice to have interactivity and a graphical user interface, but the calculations are really simple to do. This is the one presented in a PowerPoint.

If we go back to the presentation. Here it is. I'm going back to the full screen. Sorry about that. We will conclude here. The first part of the presentation was really interesting because the Beinayme-Tchebychev inequality is really an insurance for any kind of distribution, any kind of distribution. You are sure that a good percentage of the distribution will be between min plus or minus K sigma okay. François showed the example of four sigma is already good to have something close to 95% coverage of your distribution if you want to have a little bit more precise results, you can use the standard deviation.

I recently discovered this. It's very promising because you take into account the non-symmetry of any distribution, and the first result on a number of test data are very promising also. A perspective is to apply this on outlier detection. But maybe this is this will be the topic for a future presentation. I would like to thank Nikolaus Kourentzesa for the idea of Half Moment. He is a professor of statistics in Sweden, and he is working on these topics and also in ippon. Thanks to Charly Marty for a very nice GSL scripts. This is all thank you for your attention.

Thank you very much.

Presented At Discovery Summit Europe 2025

Presenter

François Bergeret

Skill level

Intermediate

Beginner
Intermediate
Advanced

Files

FrancoisJMP_discovery_summit_MSA_2025Jan30.pptx

Measurement System Analysis for Non-normal Data

Presenter

Skill level

Files

Advanced Statistical Modeling

Quality and Process Engineering