In manufacturing, you may have mutiple machines that perform the same process or produce the same output. When investigating problems, comparing the output of these "identical" machines is a common potential X on the fishbone for the problem. However, there are several mechanisms to compare such output, from comparison analyses such as t-tests, test for unequal variances, etc.(and their associated equivalence tests), using MSAs, to orthogonal regression. Each option provides different information, uses different assumptions for the data, and uses different techniques for the analysis. It can be overwhelming for the new engineer to understand the application of each technique and how to interpret it. This paper discusses the different options, when you might use them, and the pros and cons of each.

 

 

Hello, everyone. My name is Dale Human. I am an architect engineer at GE Healthcare in Milwaukee, Wisconsin, United States.

Today, I'm going to talk about ways to compare systems. Generally, I'm going to focus on measurement systems, whether you're talking about a machining system when you measure the output of the machining process or any process like that.

I'm focusing on industrial environments, but really this comparison techniques can be used in any of context, but I'll focus more on an industrial context.

You can see here, my journal shows you the different topics we'll go through. Why do we want to compare systems in the first place? Since we're talking about measurement systems, you might think of using a measurement system analysis to do a comparison to see how those compare. We'll talk about that a little bit.

If you're familiar with statistical comparison tools such as T-tests and ANOVA analysis, those sorts of tools, that might be another way you would think about doing comparison. We'll look at that. Then we'll look at another technique that you might not be so familiar with, but using a particular regression analysis for doing comparisons between measurement systems.

All right, so let's get started.

First topic, why do we want to compare systems? Well, oftentimes in an industrial environment, we may have multiple copies of the same system measuring the same dimensions on parts, measuring the same output from a process. Here in this picture, you can see there are some measuring systems, looks like some CMM machines, for example.

If you have multiple machines that are all measuring the same output, you probably want to make sure that all of those measurement systems give you reasonably similar results so that you don't have differences being introduced purely by your measurement processes.

It could also be that you're bringing in a slightly different way of making the measurements. So today you use a CMM machine as shown here, for example, using touch probes. Maybe you want to move to some an optical measurement system that is a little bit faster, for example. But you still want to make sure that if you measure the same dimensions using these two different systems that you get similar results between those two.

The goal of these comparisons is really to make sure that if I measure the same parts on either of these systems, I get the same results within an unacceptable amount of error, of course, because there will be some measurement error.

That's what we're going to talk about. I have some notes here on what does identical mean or what does similar mean. Really, that requires some discussion and thought with your team. You get a group of domain experts and discuss the differences that may exist between your systems and understand what do you think the effects of those differences will be on the output that you observe from those systems.

You will think about things like software versions, for example. Is that going to make a difference in how things get measured? Do we run the same recipe or do they require different recipes to run on each of the different machines? And does that create some differences that you might be concerned about? If you have to use fixturing and so on, do you use the same fixturing across all the systems? Does each system have its own fixturing?

These are all things you're going to want to think about when you try to say, "I think these systems should be giving me identical output," identical in quotes, of course. But that's something you're going to want to spend time on with your domain expert systems.

Let's assume that you've gone through that process, and you have two machines or maybe three or four that you're trying to compare, and you want to be sure that they're giving you similar output.

All right. How might we do that?

I mentioned previously, while we're talking about measurement systems, so let's talk about measurement system analysis. Hopefully, you've done a measurement system analysis on your measurement system.

But even if you have multiple systems, you might think? Well, let's do a Gage R&R on both systems and compare those Gage R&Rs," for example. That's something you might think about doing.

I've got a data table here that I've just popped open where you can see we have, if I turn on our column statistics here, we've gotten 30 values from five different parts between two operators. We have five parts measured by two operators, three times each in this case for a typical gage R&R design. We've measured on two different machines, so we have output from two different machines here.

We might look at the measurement system analysis from this data to try to understand, are there differences? I'm going to go ahead and perform a gage R&R analysis or measurement system analysis on the Analyze, Quality, and Process menu, going to measurement system analysis here, and then putting in the responses for the two machines, I'm going to do them simultaneously here. We'll see them both in one report.

Then putting your operator as a grouping factor and your part ID in the part sample-field. I'm keeping all of the other settings here at their defaults. For now, you may have reasons for change them depending on your exact situation.

But for our purposes, let's just keep everything to the defaults, and I'm going to click Okay, and here, of course, we see some graphs, as JMP typically does. It starts with graphical output to let you look at your data to make sure there's nothing weird going on with it.

For now, though, I'm actually going to hide these graphs. If you're unaware in JMP, you can hold down the control key to broadcast a command. I'm going to hold down the control key and hide the average chart and the range chart. That just does this for both Machine 1 output and the Machine 2 report down below. Just to make things a little bit simpler.

Then I'm going to hold down the CTRL and the Alt key this time so that I can bring up the dialog for the options that I have for Gage R&R or measurement system analysis here. To keep this one simple, I'm only going to turn on two things.

I want to look at the EMP results for those of you that might use the EMP method for a measurement system analysis. But I'm also going to turn on the AIAG Gage R&R results for those of you that tend to look at % tolerance in those metrics for your measurement system analysis.

When I click Okay there, you get a couple of reports for each of our different machines. Here's the output for Machine Number 1 in terms of the EMP process. If you're not familiar with the EMP process, the analysis is looking at the different kinds of variation in your measurement system, as does the AIAG process.

Fundamentally, this ends up giving you a classification for your measurement system.

You can have a first class through fourth class. The definitions of those classes are down here below. In this case for Machine 1, where I'm looking at Machine 1 in this report, it says this is a second class measurement system, meaning it's not quite as good as a first class system. You have a little bit more variation introduced from your measurement process. The repeatability error, the test-retest error might be a little higher than you might like, but that's what you look at from an output perspective.

Down below for Measurement 2, if I scroll down there, here's the output for Measurement 2. This one ends up being a first-class measurement system with a slightly smaller error. Here it was 0. 023, whereas up above we were at 0. 35, 0. 035. So a little bit smaller error makes it a little bit better measurement system from a classification perspective.

Similarly, if you're an AIAG organization, you might be interested in things like % tolerance. Here's our total gage R&R % tolerance row here. And on Machine Number 1, we're at about 24% tolerance, more than the 10% rule of thumb that we typically use, less than 30%. So now you're going to want to probably do a little bit more investigation.

And if I look at the % tolerance for Machine 2 down below, here I can see it's a little bit better, 15. 8%. So still above 10%, less than 30%. You would have to do some further study, probably, to understand is this acceptable for our process or not.

The question here, though, really is, how would you compare these two results? Is there anything here that allows you to say these systems are different enough that I care about it, or these systems are the same.

And really, there's not a easy way to do that just by using measurement system analysis.

So that's where we move on to other techniques. I'm going to go on to the hypothesis testing approach to doing comparisons. So let me close these windows, get those out of the way, and let's talk about hypothesis testing.

If you've done anything like a T-test or an ANOVA analysis, something like that, you've already done hypothesis testing, those tests are designed to detect differences, and that may be what you're doing here. We're interested in, are these systems the same or not? Or another way to think of it, are they different or not? You might think about doing some a T-test or a ANOVA analysis, something like that in order to compare systems. Let's look at that.

I've got another data table here where we've gathered data from four different systems. Again, I'll open the column statistics.

You can see here we have 80 data points from four different systems with some range to them. The question becomes, "How can I determine whether or not these systems are the same?" I'll expand that a little bit for you here.

Because we have a response variable here, the value that we are interested in measuring, and we have a grouping factor, the system that a particular value was measured on, we can do a "Fit Y by X-analysis" here.

You can go to the analyze Fit Y by X-platform. If we want to understand the relationship between the value that we observe and the system that we measure each those observations on, I will do an ANOVA one-way analysis. I'm going to go ahead and do that. I'll bring that over here.

Again, so we start off with a graph. Personally, when I'm doing this kind of comparison where we have continuous data grouped by a grouping factor, I like to look at box plots.

Again, I'm going to hold the Alt-key down when I click the red triangle just to bring up the dialog of all the different options. I'm going to turn off the points and turn on box plots. This is all personal preference, of course. I just like to look at box plots when I have groups like this. My brain just likes the way that I can visually compare them. You might like to keep the points on, that's fine.

But here we have some box plots showing us the distribution of the data from each of these four systems. And visually, they all look pretty similar. The boxes are roughly the same height, meaning the middle 50% of the data is roughly the same spread.

Our whiskers are roughly the same length. Maybe System 4 is a little bit better variation, a little lower variation there, but generally, they look pretty similar, so that looks good. But we can go a little step further, and we can do an actual ANOVA analysis because we want to compare performance between these four systems.

Because we have more than two, we can't do a T-test, but we can do an ANOVA analysis. I'm going to go to the red triangle and choose the ANOVA analysis here. If you're familiar with ANOVA analysis, this is comparing the means of the four different systems in this case, to see, is there a difference in average performance between these four systems?

That's essentially the question that we're trying to answer within ANOVA. It goes to the ANOVA, the green diamonds are showing you the 95% confidence intervals for the average value for each of the different systems. You can see that they're all fairly similar.

Again, System 4 may be shifted slightly. But when you come down to the analysis of variance table, you see a p-value right down here labeled as the probability greater than F. That's the p-value that's being calculated as part of an F-test here.

This is much greater than 0.05 or any reasonable choice of Alpha that you might use. And so you would conclude there is no evidence of a difference between these systems.

So that's a very common first step to comparing systems. The fact that we say, "Okay, well, there's no difference, therefore these systems are the same," that's not the correct approach. There are two things you want to think about. Number one is I've compared the means here, but that's not sufficient. Because when you're comparing populations, not only do you want to look at the central location of that population, but you also want to look at variation in populations.

I've got a little chart here that highlights the idea, here are two distributions that both have the same mean, but they are clearly different distributions. They have different variations in them.

Probably in most contexts, you wouldn't consider these two to be the same. So far, we've compared the means by doing an ANOVA analysis, but you should also do a variation comparison if you're going to try to show that two populations are not different from each other. We can do that easily enough in JMP, of course.

It gives you the capability on the red triangle here. You can just go to the test for unequal variances on the red triangle in this same analysis. Now we've added that test here.

We can see the chart here is showing you the standard deviation of the sample data for each of our four different systems, and they're all very similar. Again, System 4 is a little bit lower, but not by too much. Now, of course, we've got p-values down here. There's, again, another list of p-values. JMP now provides several different kinds of p-values in this case.

So your team probably needs to figure out which one you would like to use. I'm going to focus on the Lovine's test here. It's a fairly standard test, robust to outliers, robust and non-normal data. So I'll focus on the p-value for the Lovine's test here.

And again, this is a p-value that is much larger than 0.05. We would conclude there is no difference in variation between these two populations. At this point now, then, we've compared the average performance between our four systems, and we've compared the variation between the four systems and concluded these things are not different.

Now we can say, "All right, no difference in mean, no difference in variation." These systems are the same. Again, not so fast. One more thing to keep in mind when you're doing statistical hypothesis testing, such as an ANOVA analysis, the phrase, "There is no difference," is not the same thing as saying they are the same.

That's something we learn when we're first learning about statistical hypothesis testing. We've shown that there's no evidence of a difference in both mean and variation between these systems. But that doesn't really mean that they're the same. It just means we can't say that they are different at this time based on the samples of data that we have.

We have to think of a another thing we can do to say, can we say they are the same? Fortunately, there is a test for that also.

You may have seen equivalence testing. Let's look at equivalence testing. Equivalence testing is trying to use hypothesis tests to allow you to be able to say, "Things are the same," but really what you're saying is things are maybe different, but they are not different enough that I care about that difference.

You have to be able to define a range of equivalents that you can use to say, "As long as the difference is smaller than this range, then for all practical purposes, we will consider these things to be equivalent to each other." That's what equivalents tests are doing for you.

We can do an equivalence test. I've got a copy here. Let's do it on the report that we started building over here. Again, on the same report, using the same data. Now I'm trying to be able to state, "Can I consider these four systems to be equivalent within a range of equivalents that I define?"

On the red triangle, you can see there is an equivalence test menu. We'll just look at the means for now, just for this example. And so the equivalence test for the mean, the goal is to be able to say that, "The difference between mean performance across these systems is smaller than a value that I care about."

That's the idea of doing an equivalence test. Here we see the dialog for performing an equivalence test. We're going to stick with the T-test because we want to compare means. If you had strongly skewed data or strongly non-normal data, you might use a Wilcoxon test for ranking. But we're going to stick with the T-test for this purpose. We're hoping to demonstrate equivalence.

You can see there are a few options there. I won't go through all of them, but basically they're there to say, either things are equivalent or it is certainly less than or greater than, depending on which version you might choose. I'm going to stick with the equivalence option.

You can choose whether to use the pooled variance across all the systems or whether to assume unequal variances. Because we have just done the test for equal variances and shown that there is no difference over here, we can use the pooled variance test. I'll keep that and just click Okay.

I forgot to enter a margin, didn't I? Sorry about that. Let's go and do the equivalence test for the mean. Let's assume in this case, we've sat down with our domain experts, and we've decided that as long as the values in these systems aren't different by 10, then we would consider them to be equivalent. So whatever the units of measurement are, as long as each system is not different by more than 10, we will consider them equivalent.

We'll put a 10 in the difference box here and then click Okay. Then you get a report showing you the results of that equivalence analysis. Now, this is actually doing what is called the TOST approach, two one-sided test approach. So it's actually doing two hypothesis tests, but they're one-sided hypothesis tests to show you whether or not you can consider the differences to be smaller than the range of interest that you've entered.

Here in this output, we have a table comparing each of the two systems to each other. Each pair is being compared, then you also have a graph down below showing you the range of the difference.

If your goal is to demonstrate that things are equivalent, then what you're hoping to see is that these confidence intervals that are being calculated for the mean for each of the systems, between each system, is within this blue band. The blue band represents your range of practical equivalents, and anything outside of that range, you would say is different based on the value you've entered for your difference.

Here we can see, of course, all of our comparisons are outside of that range, which basically means we cannot conclude that these systems are equivalent, and JMP is telling you that right over here.

It gives you an assessment for each of the different combinations of the systems that are being compared. If our goal of this analysis was to show that these four systems are all equivalent, unfortunately, we have now shown that that's not true, at least within a margin of 10 units of our measurement as indicated here.

Then we might have to do some additional analysis, or perhaps you're going to have to change your development path depending on what these results mean in your context, of course.

But this is another tool that you might use to try to demonstrate equivalence between systems, is, use an equivalence test. In reality, it's really doing a difference test, but it's just allowing you to say that if there is a difference, the difference is smaller than the value that we practically care about in the context of our process or in the context of our system. But it's still a difference test from that perspective.

All right. That is equivalence testing and hypothesis testing. Let me close these windows and come back to our journal here.

I've got some notes here on, well, how do you decide what value to use for that range of a practical difference. There are a few things you might try.

Domain expertise is usually a very common approach. You get folks that have detailed knowledge of the thing that you're analyzing, and you sit down and think about from our perspective, "How different do things have to be for us to consider them to be different?" But that's something you have to think about before you can do an equivalence test.

If you were successful at this approach and said, "Oh, we've demonstrated equivalence," now can you say that things are truly the same between systems?

And it turns out maybe not, because you might think about all of this and say, "Well, the question I really want to know is if I measure a part on System A, and then I measure that same part on System B, do they give me the same value?"

Everything we've done up to this point does not guarantee that that's the case. Even if you show there is no evidence of a difference, even if you run an equivalence test and show that things are equivalent, the thing to keep in mind with these hypothesis tests is that they are comparing population statistics, mean, standard deviation, and so on.

These tests don't guarantee that on a part-by-part basis, you get the same value as you measure these parts. You can imagine a case where, here I've got samples of data from two of these systems, and on average, we would consider them to be equivalent. These mean lines are very close to each other. They're smaller than the range of practical equivalents that we've identified. So we're saying things are equivalent.

But if you were to look at individual values, you could find that the part was measured on the Machine Number 2, and it gave you this value, 4.75 or so. But when you measure that same part on Machine Number 1, you get a value more like nine. We're not going to see this in a statistical hypothesis test because those tests are looking at population summary statistics such as the mean and the standard deviation. They're not looking at individual values. In order to do that, we're going to go to regression analysis.

This is a technique, if you truly want to be able to show that things are the same on a part-by-part basis within the error of the measurements that you have, then you can use this technique to do that.

All right. Let's look at a set of data where we've measured the same set of parts now. Here's another data table where we've measured 20 parts on two different systems. It's the same 20 parts being measured on our old system that we've had for years and years, let's say, and now we've got this nice new system that's measuring the same thing, and we measure those same 20 parts on this new system.

The question is, "Can we these machines behave the same?" How do we do that? Well, you might think about if you plot this data relative to each other, because you're measuring the same parts and these measurements should be giving you the same values, you would expect to see a nice straight line between the two systems, generally with a slope of something near one, and hopefully an intercept somewhere near zero. Because if I measured 60 for a part on our old system, system, I should be getting a value of around 60 on our new system because I'm measuring the same parameter. And similarly, if I'm at 120 on the old system, I should be at about 120 on the new system.

So that's the idea.

You might think about doing a least squares fit regression, which is the very typical regression that a lot of us start with to look at this analysis.

And in this particular case, you can see here, "Well, if I just have the line Y equals X, then I have an interceptive zero and a slope of one. Yeah, that seems to fit my data well."

But we know that there's some error in that data. Let's do a least squares fit, and I've done that here. You can see here we've got another equation. Sorry, if I can get my laser pointer.

Here we have a least square's regression, and we can see that, again, the intercept is near zero. It's not exactly zero, but it's small, and our slope is very near one. If you were to do this, you'd say, "This looks promising. My slope is one, my intercept is near zero. I should be able to say that these two systems give me the same values on a part by part basis across the range of values that I'm measuring."

Great start. The problem is that least squares regression does not account for errors in the proper way in this case.

If you've learned about least squares regression, you've probably learned about residual errors in your regression. Imagine on the left here, we have a bunch of data points, and we've done a best fit line through those using least squares regression. And then all of the model diagnostics that you do as part of least squares regression is looking at the residual errors between your observations and your data in the best fit line, highlighted by these gray lines here.

In least square's regression, all of that error is always in the Y dimension. We assume the error is in the response variable. There is no error in the X variables for your regression.

But in a case like this where we have two machines that are making measurements, of course, there's measurement error in both of those machines. So what we'd really like to do is something like this, where our error is in both the Y-direction and the X-direction because our yield system that we've had for years and years has error in it, and so does our new system. Now, those errors may be the same, they may be different. That's okay. You can account for that.

But you want to allow error in both of the factors of your model here.

What can we do to do that? That's called orthogonal regression. It's called orthogonal regression because your errors are being calculated orthogonal to the best fit line that you have here. We can do that. All right.

So fortunately, when you have some data between two systems, you can do a "Fit Y by X"- analysis. I'm going to again go to the analyze Fit Y by X platform. I'm going to put our new system as our Y and our old system as our X here and click Okay.

And again, we see this chart here. We would like to be able to do our regression analysis to understand, "Can we say that these systems are the same?"

So I'm going to go to the red triangle. Normally, you might go to fit line as your typical least square's regression, but I'm actually going to go down to the fit orthogonal down here. One thing to note about fit orthogonal, is you need to have an understanding of the relationship of the error between these two systems. You have to maybe say that I believe these systems have equal variation.

If you're bringing in literally an identical copy to the machine you use today, it probably is reasonable to expect that the measurement system variation between those two systems is the same. So you might choose equal variances, or you might specify a variance ratio.

And you might do this because you've done a measurement system analysis on both of these systems, and so you know the standard deviation from the measurement system as shown here in the journal.

Once you calculate that standard deviation, you can take the ratio of the squares of those standard deviations, the ratios of the variances, and specify that value if you've done that.

For our purpose, I'm going to say, "Equal variances." I expect the variation between these two systems to be the same, so I chose an equal variance analysis here. JMP has fit an orthogonal regression line here. Again, it looks very similar to the least square's regression. And in fact, I could go up here and turn on a least square's regression by choosing fit line. You can see if you zoomed in on these graphs, which I do in the journal, I have them blown up a little bit to make it a little easier to see.

They look very similar. They're right on top of each other, fundamentally. Let me go back to the other window.

But if you were to zoom in on these axes, so if I just do this and maybe grab just that little square as an example, you can start to see them maybe separating a little bit. There, it's a little bit more clear.

But fundamentally, these lines are right on top of each other. In this particular case, that's just saying that, "Yeah, even though I have measurement error in both of these systems, because it is the same for both my Y and my X, my line isn't terribly different from the least square's regression line."

There are other cases where you will see them as very different lines. Even though they may look the same, analytically, the orthogonal regression line is really the one you want to focus on because that one's capturing the error more appropriately in this case because you have error in your Y and your X variable.

You can see down below, when you do the orthogonal regression here, again, you get an evaluation of your slope and intercept. What you're really trying to show, if your goal is to demonstrate equivalence here, you're hoping, again, to see an intercept near zero.

Looks good in this case, negative 0.95. Then you're hoping your slope is one. More specifically, you really look at the confidence interval for the slope, which is shown here. As long as one is within that range, so the lower confidence level is a number less than one, and your upper confidence level is a number greater than one, then you would really say, "Yeah, the slope of this line could be one with reasonable confidence. Therefore, I will conclude that these systems are the same on a part-by-part basis."

Again, that's really the key here is, this technique allows you to say on a value-by-value basis across the range of values that we're measuring, these systems give me the same values. That's the idea of orthogonal regression. Little different than least squares regression. Just as easy to do in JMP. Again, it's just on the red triangle. That's great.

You just have to know a little bit about the relationship of the variances or the errors in the measurement systems that you're trying to compare. But then you can do this analysis to conclude one way or the other, right? Yes, we can consider these to be the same across this range or not.

Then you might have to think about, "Well, why aren't they the same?" Of course, that's a possible outcome always. All right.

Okay. That was the topics. We looked at a few different ways of comparing systems. Because we're talking about measurements, you might think about comparing measurement system analysis, but really, there's not an easy way to do that. You, of course, would like to still do those measurement system analysis so understand the accuracy and precision of each of the different measurement systems you have.

You might think about doing hypothesis testing, which really allows you to say, "I can't tell that things are different," but technically doesn't allow you to say things are the same. Then you might think about equivalence testing, which at least allows you to say, even if there is a difference, the difference is smaller than I care about. That would be equivalence testing.

Then, orthogonal regression, as we talked through at the end there, is showing you on a point-by-point basis that I would consider these systems to be giving me the same values or not, of course. That's where you might have to do some more thinking about it. That was it for today.

Let me know if there are any questions. Thank you very much.

Presented At Discovery Summit 2025

Presenter

Skill level

Beginner
  • Beginner
  • Intermediate
  • Advanced

Files

Published on ‎07-09-2025 08:58 AM by Community Manager Community Manager | Updated on ‎10-28-2025 11:42 AM

In manufacturing, you may have mutiple machines that perform the same process or produce the same output. When investigating problems, comparing the output of these "identical" machines is a common potential X on the fishbone for the problem. However, there are several mechanisms to compare such output, from comparison analyses such as t-tests, test for unequal variances, etc.(and their associated equivalence tests), using MSAs, to orthogonal regression. Each option provides different information, uses different assumptions for the data, and uses different techniques for the analysis. It can be overwhelming for the new engineer to understand the application of each technique and how to interpret it. This paper discusses the different options, when you might use them, and the pros and cons of each.

 

 

Hello, everyone. My name is Dale Human. I am an architect engineer at GE Healthcare in Milwaukee, Wisconsin, United States.

Today, I'm going to talk about ways to compare systems. Generally, I'm going to focus on measurement systems, whether you're talking about a machining system when you measure the output of the machining process or any process like that.

I'm focusing on industrial environments, but really this comparison techniques can be used in any of context, but I'll focus more on an industrial context.

You can see here, my journal shows you the different topics we'll go through. Why do we want to compare systems in the first place? Since we're talking about measurement systems, you might think of using a measurement system analysis to do a comparison to see how those compare. We'll talk about that a little bit.

If you're familiar with statistical comparison tools such as T-tests and ANOVA analysis, those sorts of tools, that might be another way you would think about doing comparison. We'll look at that. Then we'll look at another technique that you might not be so familiar with, but using a particular regression analysis for doing comparisons between measurement systems.

All right, so let's get started.

First topic, why do we want to compare systems? Well, oftentimes in an industrial environment, we may have multiple copies of the same system measuring the same dimensions on parts, measuring the same output from a process. Here in this picture, you can see there are some measuring systems, looks like some CMM machines, for example.

If you have multiple machines that are all measuring the same output, you probably want to make sure that all of those measurement systems give you reasonably similar results so that you don't have differences being introduced purely by your measurement processes.

It could also be that you're bringing in a slightly different way of making the measurements. So today you use a CMM machine as shown here, for example, using touch probes. Maybe you want to move to some an optical measurement system that is a little bit faster, for example. But you still want to make sure that if you measure the same dimensions using these two different systems that you get similar results between those two.

The goal of these comparisons is really to make sure that if I measure the same parts on either of these systems, I get the same results within an unacceptable amount of error, of course, because there will be some measurement error.

That's what we're going to talk about. I have some notes here on what does identical mean or what does similar mean. Really, that requires some discussion and thought with your team. You get a group of domain experts and discuss the differences that may exist between your systems and understand what do you think the effects of those differences will be on the output that you observe from those systems.

You will think about things like software versions, for example. Is that going to make a difference in how things get measured? Do we run the same recipe or do they require different recipes to run on each of the different machines? And does that create some differences that you might be concerned about? If you have to use fixturing and so on, do you use the same fixturing across all the systems? Does each system have its own fixturing?

These are all things you're going to want to think about when you try to say, "I think these systems should be giving me identical output," identical in quotes, of course. But that's something you're going to want to spend time on with your domain expert systems.

Let's assume that you've gone through that process, and you have two machines or maybe three or four that you're trying to compare, and you want to be sure that they're giving you similar output.

All right. How might we do that?

I mentioned previously, while we're talking about measurement systems, so let's talk about measurement system analysis. Hopefully, you've done a measurement system analysis on your measurement system.

But even if you have multiple systems, you might think? Well, let's do a Gage R&R on both systems and compare those Gage R&Rs," for example. That's something you might think about doing.

I've got a data table here that I've just popped open where you can see we have, if I turn on our column statistics here, we've gotten 30 values from five different parts between two operators. We have five parts measured by two operators, three times each in this case for a typical gage R&R design. We've measured on two different machines, so we have output from two different machines here.

We might look at the measurement system analysis from this data to try to understand, are there differences? I'm going to go ahead and perform a gage R&R analysis or measurement system analysis on the Analyze, Quality, and Process menu, going to measurement system analysis here, and then putting in the responses for the two machines, I'm going to do them simultaneously here. We'll see them both in one report.

Then putting your operator as a grouping factor and your part ID in the part sample-field. I'm keeping all of the other settings here at their defaults. For now, you may have reasons for change them depending on your exact situation.

But for our purposes, let's just keep everything to the defaults, and I'm going to click Okay, and here, of course, we see some graphs, as JMP typically does. It starts with graphical output to let you look at your data to make sure there's nothing weird going on with it.

For now, though, I'm actually going to hide these graphs. If you're unaware in JMP, you can hold down the control key to broadcast a command. I'm going to hold down the control key and hide the average chart and the range chart. That just does this for both Machine 1 output and the Machine 2 report down below. Just to make things a little bit simpler.

Then I'm going to hold down the CTRL and the Alt key this time so that I can bring up the dialog for the options that I have for Gage R&R or measurement system analysis here. To keep this one simple, I'm only going to turn on two things.

I want to look at the EMP results for those of you that might use the EMP method for a measurement system analysis. But I'm also going to turn on the AIAG Gage R&R results for those of you that tend to look at % tolerance in those metrics for your measurement system analysis.

When I click Okay there, you get a couple of reports for each of our different machines. Here's the output for Machine Number 1 in terms of the EMP process. If you're not familiar with the EMP process, the analysis is looking at the different kinds of variation in your measurement system, as does the AIAG process.

Fundamentally, this ends up giving you a classification for your measurement system.

You can have a first class through fourth class. The definitions of those classes are down here below. In this case for Machine 1, where I'm looking at Machine 1 in this report, it says this is a second class measurement system, meaning it's not quite as good as a first class system. You have a little bit more variation introduced from your measurement process. The repeatability error, the test-retest error might be a little higher than you might like, but that's what you look at from an output perspective.

Down below for Measurement 2, if I scroll down there, here's the output for Measurement 2. This one ends up being a first-class measurement system with a slightly smaller error. Here it was 0. 023, whereas up above we were at 0. 35, 0. 035. So a little bit smaller error makes it a little bit better measurement system from a classification perspective.

Similarly, if you're an AIAG organization, you might be interested in things like % tolerance. Here's our total gage R&R % tolerance row here. And on Machine Number 1, we're at about 24% tolerance, more than the 10% rule of thumb that we typically use, less than 30%. So now you're going to want to probably do a little bit more investigation.

And if I look at the % tolerance for Machine 2 down below, here I can see it's a little bit better, 15. 8%. So still above 10%, less than 30%. You would have to do some further study, probably, to understand is this acceptable for our process or not.

The question here, though, really is, how would you compare these two results? Is there anything here that allows you to say these systems are different enough that I care about it, or these systems are the same.

And really, there's not a easy way to do that just by using measurement system analysis.

So that's where we move on to other techniques. I'm going to go on to the hypothesis testing approach to doing comparisons. So let me close these windows, get those out of the way, and let's talk about hypothesis testing.

If you've done anything like a T-test or an ANOVA analysis, something like that, you've already done hypothesis testing, those tests are designed to detect differences, and that may be what you're doing here. We're interested in, are these systems the same or not? Or another way to think of it, are they different or not? You might think about doing some a T-test or a ANOVA analysis, something like that in order to compare systems. Let's look at that.

I've got another data table here where we've gathered data from four different systems. Again, I'll open the column statistics.

You can see here we have 80 data points from four different systems with some range to them. The question becomes, "How can I determine whether or not these systems are the same?" I'll expand that a little bit for you here.

Because we have a response variable here, the value that we are interested in measuring, and we have a grouping factor, the system that a particular value was measured on, we can do a "Fit Y by X-analysis" here.

You can go to the analyze Fit Y by X-platform. If we want to understand the relationship between the value that we observe and the system that we measure each those observations on, I will do an ANOVA one-way analysis. I'm going to go ahead and do that. I'll bring that over here.

Again, so we start off with a graph. Personally, when I'm doing this kind of comparison where we have continuous data grouped by a grouping factor, I like to look at box plots.

Again, I'm going to hold the Alt-key down when I click the red triangle just to bring up the dialog of all the different options. I'm going to turn off the points and turn on box plots. This is all personal preference, of course. I just like to look at box plots when I have groups like this. My brain just likes the way that I can visually compare them. You might like to keep the points on, that's fine.

But here we have some box plots showing us the distribution of the data from each of these four systems. And visually, they all look pretty similar. The boxes are roughly the same height, meaning the middle 50% of the data is roughly the same spread.

Our whiskers are roughly the same length. Maybe System 4 is a little bit better variation, a little lower variation there, but generally, they look pretty similar, so that looks good. But we can go a little step further, and we can do an actual ANOVA analysis because we want to compare performance between these four systems.

Because we have more than two, we can't do a T-test, but we can do an ANOVA analysis. I'm going to go to the red triangle and choose the ANOVA analysis here. If you're familiar with ANOVA analysis, this is comparing the means of the four different systems in this case, to see, is there a difference in average performance between these four systems?

That's essentially the question that we're trying to answer within ANOVA. It goes to the ANOVA, the green diamonds are showing you the 95% confidence intervals for the average value for each of the different systems. You can see that they're all fairly similar.

Again, System 4 may be shifted slightly. But when you come down to the analysis of variance table, you see a p-value right down here labeled as the probability greater than F. That's the p-value that's being calculated as part of an F-test here.

This is much greater than 0.05 or any reasonable choice of Alpha that you might use. And so you would conclude there is no evidence of a difference between these systems.

So that's a very common first step to comparing systems. The fact that we say, "Okay, well, there's no difference, therefore these systems are the same," that's not the correct approach. There are two things you want to think about. Number one is I've compared the means here, but that's not sufficient. Because when you're comparing populations, not only do you want to look at the central location of that population, but you also want to look at variation in populations.

I've got a little chart here that highlights the idea, here are two distributions that both have the same mean, but they are clearly different distributions. They have different variations in them.

Probably in most contexts, you wouldn't consider these two to be the same. So far, we've compared the means by doing an ANOVA analysis, but you should also do a variation comparison if you're going to try to show that two populations are not different from each other. We can do that easily enough in JMP, of course.

It gives you the capability on the red triangle here. You can just go to the test for unequal variances on the red triangle in this same analysis. Now we've added that test here.

We can see the chart here is showing you the standard deviation of the sample data for each of our four different systems, and they're all very similar. Again, System 4 is a little bit lower, but not by too much. Now, of course, we've got p-values down here. There's, again, another list of p-values. JMP now provides several different kinds of p-values in this case.

So your team probably needs to figure out which one you would like to use. I'm going to focus on the Lovine's test here. It's a fairly standard test, robust to outliers, robust and non-normal data. So I'll focus on the p-value for the Lovine's test here.

And again, this is a p-value that is much larger than 0.05. We would conclude there is no difference in variation between these two populations. At this point now, then, we've compared the average performance between our four systems, and we've compared the variation between the four systems and concluded these things are not different.

Now we can say, "All right, no difference in mean, no difference in variation." These systems are the same. Again, not so fast. One more thing to keep in mind when you're doing statistical hypothesis testing, such as an ANOVA analysis, the phrase, "There is no difference," is not the same thing as saying they are the same.

That's something we learn when we're first learning about statistical hypothesis testing. We've shown that there's no evidence of a difference in both mean and variation between these systems. But that doesn't really mean that they're the same. It just means we can't say that they are different at this time based on the samples of data that we have.

We have to think of a another thing we can do to say, can we say they are the same? Fortunately, there is a test for that also.

You may have seen equivalence testing. Let's look at equivalence testing. Equivalence testing is trying to use hypothesis tests to allow you to be able to say, "Things are the same," but really what you're saying is things are maybe different, but they are not different enough that I care about that difference.

You have to be able to define a range of equivalents that you can use to say, "As long as the difference is smaller than this range, then for all practical purposes, we will consider these things to be equivalent to each other." That's what equivalents tests are doing for you.

We can do an equivalence test. I've got a copy here. Let's do it on the report that we started building over here. Again, on the same report, using the same data. Now I'm trying to be able to state, "Can I consider these four systems to be equivalent within a range of equivalents that I define?"

On the red triangle, you can see there is an equivalence test menu. We'll just look at the means for now, just for this example. And so the equivalence test for the mean, the goal is to be able to say that, "The difference between mean performance across these systems is smaller than a value that I care about."

That's the idea of doing an equivalence test. Here we see the dialog for performing an equivalence test. We're going to stick with the T-test because we want to compare means. If you had strongly skewed data or strongly non-normal data, you might use a Wilcoxon test for ranking. But we're going to stick with the T-test for this purpose. We're hoping to demonstrate equivalence.

You can see there are a few options there. I won't go through all of them, but basically they're there to say, either things are equivalent or it is certainly less than or greater than, depending on which version you might choose. I'm going to stick with the equivalence option.

You can choose whether to use the pooled variance across all the systems or whether to assume unequal variances. Because we have just done the test for equal variances and shown that there is no difference over here, we can use the pooled variance test. I'll keep that and just click Okay.

I forgot to enter a margin, didn't I? Sorry about that. Let's go and do the equivalence test for the mean. Let's assume in this case, we've sat down with our domain experts, and we've decided that as long as the values in these systems aren't different by 10, then we would consider them to be equivalent. So whatever the units of measurement are, as long as each system is not different by more than 10, we will consider them equivalent.

We'll put a 10 in the difference box here and then click Okay. Then you get a report showing you the results of that equivalence analysis. Now, this is actually doing what is called the TOST approach, two one-sided test approach. So it's actually doing two hypothesis tests, but they're one-sided hypothesis tests to show you whether or not you can consider the differences to be smaller than the range of interest that you've entered.

Here in this output, we have a table comparing each of the two systems to each other. Each pair is being compared, then you also have a graph down below showing you the range of the difference.

If your goal is to demonstrate that things are equivalent, then what you're hoping to see is that these confidence intervals that are being calculated for the mean for each of the systems, between each system, is within this blue band. The blue band represents your range of practical equivalents, and anything outside of that range, you would say is different based on the value you've entered for your difference.

Here we can see, of course, all of our comparisons are outside of that range, which basically means we cannot conclude that these systems are equivalent, and JMP is telling you that right over here.

It gives you an assessment for each of the different combinations of the systems that are being compared. If our goal of this analysis was to show that these four systems are all equivalent, unfortunately, we have now shown that that's not true, at least within a margin of 10 units of our measurement as indicated here.

Then we might have to do some additional analysis, or perhaps you're going to have to change your development path depending on what these results mean in your context, of course.

But this is another tool that you might use to try to demonstrate equivalence between systems, is, use an equivalence test. In reality, it's really doing a difference test, but it's just allowing you to say that if there is a difference, the difference is smaller than the value that we practically care about in the context of our process or in the context of our system. But it's still a difference test from that perspective.

All right. That is equivalence testing and hypothesis testing. Let me close these windows and come back to our journal here.

I've got some notes here on, well, how do you decide what value to use for that range of a practical difference. There are a few things you might try.

Domain expertise is usually a very common approach. You get folks that have detailed knowledge of the thing that you're analyzing, and you sit down and think about from our perspective, "How different do things have to be for us to consider them to be different?" But that's something you have to think about before you can do an equivalence test.

If you were successful at this approach and said, "Oh, we've demonstrated equivalence," now can you say that things are truly the same between systems?

And it turns out maybe not, because you might think about all of this and say, "Well, the question I really want to know is if I measure a part on System A, and then I measure that same part on System B, do they give me the same value?"

Everything we've done up to this point does not guarantee that that's the case. Even if you show there is no evidence of a difference, even if you run an equivalence test and show that things are equivalent, the thing to keep in mind with these hypothesis tests is that they are comparing population statistics, mean, standard deviation, and so on.

These tests don't guarantee that on a part-by-part basis, you get the same value as you measure these parts. You can imagine a case where, here I've got samples of data from two of these systems, and on average, we would consider them to be equivalent. These mean lines are very close to each other. They're smaller than the range of practical equivalents that we've identified. So we're saying things are equivalent.

But if you were to look at individual values, you could find that the part was measured on the Machine Number 2, and it gave you this value, 4.75 or so. But when you measure that same part on Machine Number 1, you get a value more like nine. We're not going to see this in a statistical hypothesis test because those tests are looking at population summary statistics such as the mean and the standard deviation. They're not looking at individual values. In order to do that, we're going to go to regression analysis.

This is a technique, if you truly want to be able to show that things are the same on a part-by-part basis within the error of the measurements that you have, then you can use this technique to do that.

All right. Let's look at a set of data where we've measured the same set of parts now. Here's another data table where we've measured 20 parts on two different systems. It's the same 20 parts being measured on our old system that we've had for years and years, let's say, and now we've got this nice new system that's measuring the same thing, and we measure those same 20 parts on this new system.

The question is, "Can we these machines behave the same?" How do we do that? Well, you might think about if you plot this data relative to each other, because you're measuring the same parts and these measurements should be giving you the same values, you would expect to see a nice straight line between the two systems, generally with a slope of something near one, and hopefully an intercept somewhere near zero. Because if I measured 60 for a part on our old system, system, I should be getting a value of around 60 on our new system because I'm measuring the same parameter. And similarly, if I'm at 120 on the old system, I should be at about 120 on the new system.

So that's the idea.

You might think about doing a least squares fit regression, which is the very typical regression that a lot of us start with to look at this analysis.

And in this particular case, you can see here, "Well, if I just have the line Y equals X, then I have an interceptive zero and a slope of one. Yeah, that seems to fit my data well."

But we know that there's some error in that data. Let's do a least squares fit, and I've done that here. You can see here we've got another equation. Sorry, if I can get my laser pointer.

Here we have a least square's regression, and we can see that, again, the intercept is near zero. It's not exactly zero, but it's small, and our slope is very near one. If you were to do this, you'd say, "This looks promising. My slope is one, my intercept is near zero. I should be able to say that these two systems give me the same values on a part by part basis across the range of values that I'm measuring."

Great start. The problem is that least squares regression does not account for errors in the proper way in this case.

If you've learned about least squares regression, you've probably learned about residual errors in your regression. Imagine on the left here, we have a bunch of data points, and we've done a best fit line through those using least squares regression. And then all of the model diagnostics that you do as part of least squares regression is looking at the residual errors between your observations and your data in the best fit line, highlighted by these gray lines here.

In least square's regression, all of that error is always in the Y dimension. We assume the error is in the response variable. There is no error in the X variables for your regression.

But in a case like this where we have two machines that are making measurements, of course, there's measurement error in both of those machines. So what we'd really like to do is something like this, where our error is in both the Y-direction and the X-direction because our yield system that we've had for years and years has error in it, and so does our new system. Now, those errors may be the same, they may be different. That's okay. You can account for that.

But you want to allow error in both of the factors of your model here.

What can we do to do that? That's called orthogonal regression. It's called orthogonal regression because your errors are being calculated orthogonal to the best fit line that you have here. We can do that. All right.

So fortunately, when you have some data between two systems, you can do a "Fit Y by X"- analysis. I'm going to again go to the analyze Fit Y by X platform. I'm going to put our new system as our Y and our old system as our X here and click Okay.

And again, we see this chart here. We would like to be able to do our regression analysis to understand, "Can we say that these systems are the same?"

So I'm going to go to the red triangle. Normally, you might go to fit line as your typical least square's regression, but I'm actually going to go down to the fit orthogonal down here. One thing to note about fit orthogonal, is you need to have an understanding of the relationship of the error between these two systems. You have to maybe say that I believe these systems have equal variation.

If you're bringing in literally an identical copy to the machine you use today, it probably is reasonable to expect that the measurement system variation between those two systems is the same. So you might choose equal variances, or you might specify a variance ratio.

And you might do this because you've done a measurement system analysis on both of these systems, and so you know the standard deviation from the measurement system as shown here in the journal.

Once you calculate that standard deviation, you can take the ratio of the squares of those standard deviations, the ratios of the variances, and specify that value if you've done that.

For our purpose, I'm going to say, "Equal variances." I expect the variation between these two systems to be the same, so I chose an equal variance analysis here. JMP has fit an orthogonal regression line here. Again, it looks very similar to the least square's regression. And in fact, I could go up here and turn on a least square's regression by choosing fit line. You can see if you zoomed in on these graphs, which I do in the journal, I have them blown up a little bit to make it a little easier to see.

They look very similar. They're right on top of each other, fundamentally. Let me go back to the other window.

But if you were to zoom in on these axes, so if I just do this and maybe grab just that little square as an example, you can start to see them maybe separating a little bit. There, it's a little bit more clear.

But fundamentally, these lines are right on top of each other. In this particular case, that's just saying that, "Yeah, even though I have measurement error in both of these systems, because it is the same for both my Y and my X, my line isn't terribly different from the least square's regression line."

There are other cases where you will see them as very different lines. Even though they may look the same, analytically, the orthogonal regression line is really the one you want to focus on because that one's capturing the error more appropriately in this case because you have error in your Y and your X variable.

You can see down below, when you do the orthogonal regression here, again, you get an evaluation of your slope and intercept. What you're really trying to show, if your goal is to demonstrate equivalence here, you're hoping, again, to see an intercept near zero.

Looks good in this case, negative 0.95. Then you're hoping your slope is one. More specifically, you really look at the confidence interval for the slope, which is shown here. As long as one is within that range, so the lower confidence level is a number less than one, and your upper confidence level is a number greater than one, then you would really say, "Yeah, the slope of this line could be one with reasonable confidence. Therefore, I will conclude that these systems are the same on a part-by-part basis."

Again, that's really the key here is, this technique allows you to say on a value-by-value basis across the range of values that we're measuring, these systems give me the same values. That's the idea of orthogonal regression. Little different than least squares regression. Just as easy to do in JMP. Again, it's just on the red triangle. That's great.

You just have to know a little bit about the relationship of the variances or the errors in the measurement systems that you're trying to compare. But then you can do this analysis to conclude one way or the other, right? Yes, we can consider these to be the same across this range or not.

Then you might have to think about, "Well, why aren't they the same?" Of course, that's a possible outcome always. All right.

Okay. That was the topics. We looked at a few different ways of comparing systems. Because we're talking about measurements, you might think about comparing measurement system analysis, but really, there's not an easy way to do that. You, of course, would like to still do those measurement system analysis so understand the accuracy and precision of each of the different measurement systems you have.

You might think about doing hypothesis testing, which really allows you to say, "I can't tell that things are different," but technically doesn't allow you to say things are the same. Then you might think about equivalence testing, which at least allows you to say, even if there is a difference, the difference is smaller than I care about. That would be equivalence testing.

Then, orthogonal regression, as we talked through at the end there, is showing you on a point-by-point basis that I would consider these systems to be giving me the same values or not, of course. That's where you might have to do some more thinking about it. That was it for today.

Let me know if there are any questions. Thank you very much.



Start:
Sun, Jun 1, 2025 09:00 AM EDT
End:
Sun, Jun 1, 2025 10:00 AM EDT
Attachments
0 Kudos