Choose Language Hide Translation Bar

That Can’t Be My Weight! Is My Scale Broken? An Exploration of Measurement Systems Analysis (MSA) (2021-US-45MP-826)

Level: Intermediate


Jason Wiggins, Sr. Systems Engineer, JMP


From our homes to the lab to the production floor, measurements systems are everywhere. At home, there is the dreaded bathroom scale. In test labs, a variety of systems are used to measure important physical properties like material strength. Production systems rely on measurements of quality characteristics such as the shape of a machined part. The measurement systems we interact with have a variety of behaviors. They may be noisy or precise, accurate or biased. They may or may not alter or destroy our samples. Regardless of behavior, we need to know enough about the noise in our measurement systems to make informed choices about how they are used to avoid costly errors. In the early 90’s the Hubble telescope’s main mirror was overground due to an errant measurement. The mistake, discovered post launch, nearly sidelined the project. Understanding and addressing the source of variation through a MSA pre-launch could have saved NASA more than $600 million for trip to space to repair it. This presentation will cover the ins and outs of applying MSA results from the Evaluating the Measurement Process (EMP) method pioneered by Dr. Donald Wheeler. An overview of EMP along with examples from home and industry will be explored. EMP gauge classification, reporting precision and guard banding specification limits to ensure product conformance will be demonstrated in JMP.



Auto-generated transcript...




Olivia Lippincott Again, the recorder is on ago, often on mute and I'll come back on.
Jason Wiggins, JMP Right. Hello, everyone. It's my pleasure
  that darn bathroom scale.
  So for me it's hard to imagine life without tools that measure stuff, and measuring stuff is a very human characteristic. We just do that.
  I find a lot of happiness in measurement systems that are well behaved and most certainly, I get frustrated when when they're not.
  There is a cost to developing, refining, and even correctly using a measurement system, whether it's simple or complicated, there is a cost associated with that.
  In my experience, this expense is small when compared to the consequential costs of using data from a faulty gauge or even misusing data from a good one.
  So a great example of this is the Hubble space telescope.
  An aberration in the primary mirror was discovered not long after launch in 1990.
  The aberration impacted the clarity of the telescope's images; in fact, this is a before and after.
  The source of the problem was traced back to a miscalibrated piece of equipment that was used to grind the mirror.
  over $620 million. That's 1993 dollars; that's quite a lot of money.
  And I believe this cost could have been avoided completely had a measurement systems analysis been done prior to grinding the mirror.
  Now while the Hubble telescope...Hubble space telescope repair is a is a great example of a high cost measurement mishap, issues with measurement systems, even with lower stakes, I feel, can be equally as as frustrating. So take for example, my dreaded bathroom scale.
  Last year...December of last year I started on a on a weight loss journey.
  And just give you a sense for what I was seeing over time and then from day to day.
  I would routinely get four pound jumps in weight. Now I think if if if I'm out here, near my target weight...hovering around my target weight, maybe you know, plus or minus four pounds' fluctuation
  isn't an unrealistic thing to expect. Now emotionally something different happens early on in this process and
  we'll just kind of narrow in and talk through this a little bit. So here's me, I'm working very diligently to dump weight, doing all the right things.
  And then I wake up one morning and I'm four pounds heavier. So that's frustrating, right? And hopefully that's something that others can relate to as well, but my question
  throughout this (I probably should have done a measurement systems analysis before I started this journey), but my question when I would see stuff like that
  is how much of that four pounds is me and how much of it is noise in the measurement system? So I ran a measurement systems analysis to answer the question. Now there's a spoiler alert here.
  The first study that I did wasn't very good. The gauge is abysmal. I would have thrown it away, but I like to recycle my electronics.
  The end of the day, this was not a good usable gauge.
  Part of my goal today is, I mean, this is a fun exploration of a real-life measurement system thing that, you know, maybe we can all relate to, but one of my other objectives is to highlight some of the benefits
  of using the EMP method for measurement systems analysis. Now to do this, I'll cover some basic definitions and then demonstrate the method using
  results from an MSA from a more well-behaved gauge. Now rest assured, we will get back to my abysmal horrible bathroom scale example here in just a minute.
  Alright, so first off, what is measurement systems analysis? For me, a simple explanation of this is that, you know, we're going to run an experiment, and what we want to get out of the experiment is to identify or measure measurement system limitations.
  What are our goals? Well, if we do a measurement systems analysis, we will certainly can determine whether the measurement system we're using is good enough
  to use or whether it needs improvement. But I feel more important to that and where the EMP method shines
  is that we want to set standards for using the measurement data. There's no such thing as a perfect measurement system, so what are some standards that we can put in place that help us use our measurement data effectively?
  How's it done? It's a variance components analysis, so we are looking for the variance components associated with the measurement system,
  and, hopefully, those are small compared to the measurement variation associated from...with our parts. So we have some total variation of all of our measurements and then there are components, these two different components of variation that we need to understand.
  So what is EMP? It stands for evaluating the measurement process. It's pioneered by Dr. Donald Wheeler. There's an awesome book that I'd highly recommend getting.
  For me, I think, probably the simplest way to think about EMP is that it's providing a statistical approach for using measurement data effectively. So it's not enough to know whether my gauge is good or bad,
  so first class, second class, third class, those are based on probability of detecting a shift in in the process.
  The old method, the one that I learned first, the GRR, percent R&R, the AIAG method, uses arbitrary thresholds for gauge classification. So here's a problem with that that...that I encountered and it was what actually brought me to EMP to begin with.
  Many years ago, I was working with a destructive test, so this is a measurement system.
  And we did a measurement systems analysis, and it turned out to be a fourth-class gauge by the percent R&R standards. Well...well,
  you know, that's something, but what is that telling us? And the hard thing about what made that difficult to stomach was the fact that we were using this destructive test
  to make decisions in product development that were generating revenue. It was a useful gauge, so the percent R&R told us nothing about the utility of our gauge and it didn't give us anything in terms of
  recommendations for how we might use that gauge. So I kind of feel that the percent R&R approach for me really lacks information that I need to use the measurement data effectively.
  All right, a less frustrating, more well-behaved gauge example. This is the coordinate measure machine.
  Measures geometry of objects. This is a very simple experiment where you're just measuring one dimension of length.
  So length of parts are noise variables, like you'll notice this little joystick here. We have operator involvement in the measurement system. I've seen automated versions of this and maybe you wouldn't include operator, but in this case you do.
  Have variation associated with parts, and then this operator/part interaction. Now this is...this interaction term, like for those of you who love DOE,
  we all know that interactions happen everywhere, but I'll do my best to demonstrate an example of a very common operator/part interaction. So what I'm holding is a little
  metal cylinder and I have a dial caliper.
  I'm going to
  get a measurement on the diameter of this part and I see it's about 19 millimeters.
  If I knew what the size of this was or I had an expectation of what that that diameter should be, what I might do is I might actually apply more or less pressure to the dial on my my gauge
  so that I hit my expected measurement. Now this happens everywhere. I catch myself doing it. If these types of things happen every time we as humans interact with measurement systems, so
  if you have the money and the budget to build an experiment to look for this, I most certainly recommend that.
  Okay, JMP 16. This is kind of a fun thing. I love this new tool. There is a...
  I'm MSA specific experiment design tool in the DOE menu for JMP 16.
  I'll show you, kind of walk through a demonstration of this, but before I do, I kind of want to have a little discussion about randomization.
  For all of us that do DOE and measurement systems analysis, we know we randomize because we want to reduce the risk of lurking variables impacting our results. So that's kind of a given, right?
  If we don't completely randomize, we're going to potentially run into some issues.
  What I would also recommend is to make the operator blind to the random run orders. So imagine me and my caliper, I'm measuring 10 different parts and I don't ever know which part it is that I'm measuring, so there's less of an opportunity
  for me to bias the experiment, the measurement systems analysis experiment, just by knowing that information. So randomized, make it blind if you possibly can.
  If you run into randomization issues, you could possibly look at a split plot design or some other means of dealing with the randomization problem, but we definitely want to do that as as to the extent we possibly can.
  Okay, let's take a look at it. DOEs, Special Purpose, there's our MSA design.
  Just like all the other DOE tools in JMP, I can, you know, type in the name of my factors and maybe I need to add a another factor for operator here.
  One thing I really like here is that I can give this an MSA role, and this gets copied as a column property in the data table that we're going to generate.
  So let's say operator has a different name, but I want to have at least some anchor to the fact that I'm using that as an operator role in the MSA, and I can do that. That's pretty awesome.
  In this experiment, we have four different parts, four different operators, and we're going to run two replicates. So what this means
  is that I have my original copy, my first run through the experiment, and I'm going to repeat that two more times. So I get three measurements for each run in my original
  We're going to completely randomize and let's take a look at the design. So much the same as DOE, get a quick preview of of our experiment design, can look at that,
  generally make sure if it makes sense to us, if that's what we wanted to do and what we're seeing is correct. What I really like about this tool is the design diagnostics, so a little bit different than what we see in the custom designer, for instance,
  or in the compare designs tool. There's a simulation that's running behind this and we can play know, maybe we adjust the variance components associated with my test and there's a simulation that runs in the background. And we can
  ask some questions. Is our experiment design set up well enough that we can tag a first-class gauge if if that's what we have? So really cool thing.
  I love it because it is very measurement systems analysis focused, and just like DOE, hey, if we need to add additional parts or additional operators, add additional replicates, you know, we can play with that within the platform without dumping to a table right away.
  All right, let's take a look at our standard MSA.
  I want to walk through some some basic graphical outputs. We'll do it kind of slow because we'll be repeating this process a few different times.
  For this measurement system, the graphical outputs that we get from the EMP report are actually really pretty cool. So for instance, this average chart,
  if I were to show the data here rather than the averages, that would be just like same result we would get from the variability attribute gauge chart tool.
  So what it's telling us if my measurements...most of my measurements are outside of my control elements, then we're very likely going to be able to detect differences between parts.
  Standard deviation, and if I would have chosen range for dispersion, we'll talk about this a little bit more in in just a minute.
  This visualizes repeatability so we want everything to be inside limits here, and that's the case with with this gauge. So a few more.
  Here's one of my favorites. Again, the operator/part interaction. This graph, if I have lines crossing, indicate that an interaction is is possible here. So
  they don't...these lines don't have to stack on top of each other and they can be gap apart, but we're really hoping we don't get any crossing of lines in that report. Analysis of means, I like this, you can use it as a procedure tool.
  It's just telling us that operator one tends to measure low on average and operators two and three tend to measure high.
  And that may not necessarily be bad, but if it is, we might be able to drill into the source. You know, why is it that operator one measures low on average? So again,
  test, retest, error comparison. We're just showing whether there's an inconsistency in how each operator is is measuring.
  So those are the graphical. Let's talk a little bit about some additional EMP terms. So kind of the foundation of the gauge classification in EMP is the intraclass correlations. That's the proportion of variation from the part.
  And we hope that within within is associated with our repeat measurements part and
  and operator and part of the biasing terms. We're really hoping that these are small compared to the total, so higher is better on this this statistic; one is perfect.
  And if you're going to go through Wheeler's book and do hand calculations on things like intraclass correlation from variance components, definitely use the standard deviation chart as the dispersion type.
  Wheeler has a really good discussion in his book about when you might want to use the range and we're going to see that, with the bathroom scale measurement system. So if your...if your precision is fairly refined, you probably
  could use the standard deviation dispersion type. If it's chunky, range is probably going to be the better better approach for that.
  Probable error. We're going to see this in a few different places. There's a lot of utility around this and I took this from the book.
  Wheeler describes this as "No measurement should ever be interpreted as being more precise than plus or minus one probable error since measurements will err by this amount or more at least half of the time." So this statistic is going to be used in a couple of different places.
  Notably, we're going to use both a measurement increment,
  which is a function of the probable error, to adjust our precision expectations for the measurement, how many decimal places do we want to report.
  And we'll see another place where probable error is used in guard banding...statistical guard banning of spec limits.
  Okay, so we'll just go down through the report and then we'll actually walk through the steps in in JMP.
  We get a realistic gauge classification, so this is saying second class. Wheeler's recommendations are you may decide to chart the measurement process, so I may want to run a control before I do this measurement.
  It's arguable. You know, it's again a budget conversation, you know. Can we afford the time to do that? Yes...if yes, do it.
  If no, what...what's the cost of improving the gauge? Can we move it to a first-class gauge? How much do we think that might cost in a project to do that?
  this monitor classification legend. Spend a little bit of time with it; you can extract some some guidelines for for control charting...or charting
  the actual process. Effective resolution for this gauge, we have to drop digits. I'll walk through how I go about doing that.
  Variance components, this is great. Part contains most of the variance, so that's that's a good thing, although we do still have some operator
  and some operator/part interaction. We saw that in the graphical analysis as well. Okay, let's...
  now I built an add-in, and right now it's...I'm going to use it for demonstration purposes. It's a little buggy. If I can get some of those things resolved, I'll post it.
  But this is just launching the the EMP platform and then it's going to give me some other things, other options associated with with
  the EMP report. All right. So we'll just remember part and operator. I'll use standard deviation in this case.
  And there we go. Click OK. So these are all the things that we talked about in the slides.
  I don't think there's anything different there to see. Now this this comes right out of EMP, and what I'm doing is I'm asking, given the noise in my measurement system, how...what's the product conformance I can expect
  with my specification limits and can I improve that by tightening my specification limits? So I'm going to use statistics to guard band
  my spec limits. So for those who are in high-tech manufacturing, you've probably heard of guard banding spec limit. Sometimes it's done as just a percent.
  But I love EMP because I can use the probable error to make a calculation
  that takes my measurement system noise into account when I tighten those limits. So what this is saying is, hey, for 96% conformance,
  I need to tighten my bands. I forgot to adjust the decimal places here, but that would be 56 to 74.
  So I just tighten it by one millimeter. That's it, on either on either side, either specification. The black lines, these are the original specification limits, the ones that we set for our customer.
  If we run the gauge at these specification limits, we can expect about 64% conformance, so there's a trade-off here, right?
  If we want to tighten the spec limits, it's going to cost us. We're going to be rejecting more parts, but is it going to protect the customer a little bit better? Most certainly. In fact, if we go four probable errors in from the spec limit,
  we can get 99.9% conformance, but we're going to be rejecting a lot more product and some of that product could actually be useful to the customer.
  The trade-off here is voice of the business versus voice of the customer, and it's a it's a business decision and it's and it's worthy of some discussion before you decide
  how much you want to tighten your specification limits. Okay, that is the well-behaved gauge and we saw...oh I forgot, I was going to show
  how I go about the the the decimal place precision problem. So a coordinate measurement machine spits out a lot of decimal places.
  EMP is telling me that I really should drop digits. The way I do it is I just create a formula column, using around function, and and then I adjust this so zero is where I am now, but I actually tried two and one.
  and reran the EMP analysis. And soon as EMP tells me that I don't have to drop a digit anymore, that's the precision that I'm going to report.
  Okay, and the manufacturing instructions. Again, this is another place where probable error and measurement increment are used in in that calculation.
  It's great. It's a statistical approach to something that, you know, is common in high-tech manufacturing; it's guard banding. We can do that using our measurement system noise to
  make the calculation. All right, back to the dreaded bathroom scale. All right, so this is a little fun.
  Let me walk you through the design.
  What I think is kind of fun about this is that when we set about to do an experiment, whether it's a DOE or measurement systems analysis,
  we have to bring a little creativity to bear. And I had to. I almost got stumped by this, to be honest. So the noise factors associated with me weighing myself or anybody in my family weighing myself, well there's us, you know. I stand on the scale.
  Part...think about part is like my daily weight fluctuation, right?
  I can get, you know, let's say I'm at plus or minus four. I ate too much for a week and I gained two pounds. That's the thing that I want to be able to detect.
  Now getting that's kind of tough actually, because I have to kind of break out that daily weight fluctuation from the operator part, the me part of it.
  The way I went about it...there are probably a lot of different ways to do this, but the way I went about it is, I have my my participants in the measurement systems analysis, my family. They would measure their body mass,
  step off the scale, I hand them a dumbbell, they step back on the scale. I subtract those and then I get the weight, so that'll be my part measurement.
  Now to get this on the scale of my target weight that makes it a little easier for me to understand I just normalize it.
  And it's just changing the scale to something that makes a little more sense for me. Okay let's...let's see how this looks. All right, we'll do the add-in again. There's something that it's worthy of
  note. Okay, I think if I recall this, it's going to remember operator and part again. This is a chunky measurement. I'm going to use the range chart dispersion type and the design is crossed.
  Okay, already, right out of the gate, we're seeing that we probably can't detect
  differences in daily weight. Now there could be something in here that's associated with how I ran the experiment; it's certainly possible,
  but it's not looking really good. So let's look at a few others, so range chart, that's looking pretty good. Ah...whoa, there's a problem. My within variation, so that's associated with my repeat measurements, well, that's bigger than my variance associated with my part to part variation.
  That's not a good thing.
  So problem two. You can kind of go through some of these others. Test, retest looks pretty good. My bias comparison analysis of means results look good. Alright, here's another problem. Look at the size of that probable error. Remember, I'm really hoping that I can see at least
  a one pound difference in weight and have that be meaningful. Well, probable error is three times greater than that and that's a problem.
  I'm being told to drop a digit. If I iterate through this, I actually have to go into scientific notation, report two digits, to get it even close.
  And even still the probable error is just huge, and we're going to see, kind of a little further down, how that can be problematic in terms of using the gauge.
  Right third-class gauge. So if we had to use the gauge, we would definitely be charting the measurement process, so running controls or standards
  before we do our daily weight. Operator/part interaction, there it is again. This is a funny one because operator three is my brother-in-law. He's kind of a smart guy, and and as much as I tried to make it not possible for him to game the test, I think he was gaiming the test.
  So that could be associated with my study and not necessarily with the gauge; just my observation. So we'll play that 96% conformance game again, dial that down to zero. And oh no, I can't compute the limits. In fact, my lower limit is is actually
  higher than my upper limit. And why did that happen? Well, in the calculation
  (let's see, if I back up to where I showed that),
  if that probable error is too big, then it's going to create a problem, so there is a little bit of a limitation to this this
  tool that Wheeler uses in in his book. But for me, hey, what this is telling me is that this bathroom scale's a piece of junk and I just need to buy a new one. So I'm I'm kind of cheap,
  but I decided, you know, 50 bucks is probably not an unreasonable price, so I bought a Fitbit scale and we'll see what...let's see what that looks like.
  Run the study again.
  And range, we'll keep that the same. Alright, already we're seeing a little bit of improvement. It's not as good as my coordinate measurement machine, but, hey, some of my measurements are actually falling outside the control limits, so I I would count that as an improvement.
  Also look at that. My within variance component is actually less than part. I mean, it's still kind of big,
  but 50 bucks...maybe maybe if I spend 100 or $1,000, I could get that down a little bit, but I think I'm willing to live with that. All right, so probable error, hey, yay. We're at least under a pound.
  And of course, we need to consider dropping a digit, and so I would go through that rounding exercise with this as well, and it turns out that reporting zero decimal precision is what we want to do. And let's take a look at this, 96% conformance.
  Again, hey, I actually get something that's reasonable. So for me,
  what I set as my expectation going into this whole exercise of losing weight was that I wanted to be at a target weight at 180.
  But I'm willing to live with plus or minus five pound weight fluctuations. And as I've gone through this journey, I've noticed that things that I eat can change, water weight gain within a day so that's a pretty reasonable expectation. Now if I want to be sure,
  that I'm staying true to my goal, maybe I need to bring those specifications limits in by a pound.
  So what started out as kind of a frustrating exercise through a fairly long journey of losing weight and working with a horrible scale, I've arrived at something that I can live with and that I'm actually kind of happy about. So
  that's all I have. Thank you. Hopefully, this has been an entertaining a walk through EMP method of measurement systems analysis.
Olivia Lippincott yeah stop the recording.
Jason Wiggins, JMP And I do in time oh good.

Thank you for the presentation. Great stuff. Based on your recommendation, I picked up a copy of EMP III and so far it's a great read. I think your add-in looks quite promising. I think something like that would come in handy. Being able to perform a guardband analysis  on a measured parameter would be great. Being able to perform a similar analysis on a lot of parameters at the same time, something like the Process Screening platform, would be fantastic. The method of identifying appropriate guardbands based on carefully collected data in combination with consistently applied mathematics is an improvement over arbitrarily assigned guardbands based on percentages. I look forward to investigating these concepts further and applying them in practice. Thanks again for the presentation, and bringing these analysis methods to my attention.

Thank you for your kind feedback, Nathan! I have a little more work to do on the add-in and will post soon. 


@Jason_Wiggins Is EMP add-in available? 

Hi Daisy. As of right now, I still have some code that needs to be cleaned up. I will move this up in my que and get something out soon. Thank you for the kind reminder!

Hi @szdaizha, @ngambles,


I have uploaded the add-in. There is a help document and sample data file in the add-in menu. The help document contains a recipe for using the add-in as well as background and references. Let me know if it works for you or if you have issues. I will post to the file exchange soon as well.




Hello Jason, add-in did not work at my JMP16.2, could not enter specification limit and generate gauge performance plot. Can you help to take a look? thanks! @ Jason_Wiggins