Measurement Systems Analysis for Curve Data Using Functional Random Effects Models (2023-EU-30MP-1316)

1 Kudo

Colleen McKendry, Senior Statistical Writer, JMP Group Inc

This presentation is an extension of a case study presented at a Discovery conference a few years ago, where a client’s protocols required a Gauge R&R study to be performed before running a functional response designed experiment. As in a standard Gauge R&R study, there were several Parts, several Operators, and several replicates per combination of Part and Operator. However, in this case, the test equipment returned a set of curves as the response instead of a single point. A functional random effects model is appropriate for this type of data. In this application, the functional model is expanded using basis splines and then expressed as a mixed model, where variance components can be estimated using standard methods. This is done using the Functional Data Explorer and Fit Mixed platforms. Due to the functional model expansion, multiple variance components may be associated with each of the Part, Operator, and Part*Operator terms. It is shown that these variance components can be summed and written in the form of a standard Gauge R&R computation, therefore providing a Functional Gauge R&R analysis.

Hi, my name is Colleen McKendry, and I am a senior statistical writer at JMP, but I also like to experiment with functional data. This presentation is an extension of when Flash was inspired by a presentation that was originally done in 2020 titled Measurement Systems Analysis for Curve Data.

There was also a slightly earlier presentation at a JSM conference in 2019. My talk is essentially how I would go about solving the problems that were presented in those original papers. I'll discuss them a little bit more later too.

First, a little bit of background on measurement systems analysis. MSA studies determine how well a process can be measured prior to studying the process itself. So it answers the question, how much measurement variation is contributing to the overall process variation.

Specifically, the Gage R&R method, which I'll be using later in this analysis, determines how much variation is due to operation variation versus measurement variation. These types of studies are important and they're often required to be performed prior to any type of statistical process control or design of experiments.

A Gage R&R classical MSA model is shown here. For a given measurement, your response, Y sub I K is the Kth measurement on the Ith part. In this model, you have a mean term and a random effect that corresponds to the part and your error term.

The random effect in the error term are normally distributed random variables with mean zero and some corresponding variance components. This is simply a random effects model, and we can use that model to estimate the variance components, and then use those variance component estimates to calculate the % gage R&R using the formula that's shown on the screen.

Here we have the same model but the crossed version. For your response, Y sub IJK , that's going to be the Kth measurement made by the Jth operator on the on the Ith part. A gain, we have a mean term, a random effect that corresponds to the part, and now we have a random effect that corresponds to the operator, and a random effect that corresponds to the cross term, which is the interaction between the operator and the part, and of course, our error term.

All of these random effects are normally distributed random variables with mean zero and some corresponding variance component. So just like in the classical model, this is just a random effects model, and we can estimate the variance components and use them to calculate the % gage R&R.

In both of the models that I just described, the response or the measurement was a single point. But what happens if this isn't the case? What if your measurement is something like a curve instead? This was the motivation behind those initial presentations in 2019 and 2020 that I talked about.

There was a client of JMP that was a supplier of automotive parts, and they had a customer that specified that a part needed to have a specific force by distance curve. Obviously, the client wanted to match their customer specified curve, and so they wanted to run a functional response DOE analysis in JMP to design their product to do that.

However, before spending the money on this type of experiment, they first wanted to perform an MSA on their ability to actually measure the parts force. There's a lot more details in the paper noted at the bottom, so if you're interested in more of the background, please see that.

This is what the data looks like. We have force on the Y axis and distance on the X axis, and the curves are colored by part. It looks like there are only 10 curves, but there are actually 250 curves in total. It's just that a lot of the curves are clustered together.

In the data, there were 10 parts, five operators, and five replications per part operator combination. I just wanted to note that this is simulated data, and it's simulated to look similar to the actual data, but that we aren't sharing any proprietary data here.

A few function characteristics that I wanted to point out, the functions are actually all different lengths, so they have a different number of observations in their curve. Although the functions were collected at equally spaced time intervals, they were not collected at equally spaced distances. That means there's no true replication in terms of distance.

When this project was first presented, one of the original ideas thrown out was whether we could pick a set of distance locations and do a standard gage R&R MSA at each of those locations and then summarize that information for a final result.

The problem with that is that if we picked a specific location, there wasn't a guarantee that there would be an observation for each of those curves because there wasn't replication for the distance. Another problem that's more generalized is with this type of curve data, doing point wise analysis like that does not take into account the within function correlation.

Luckily, there's a whole field of statistics dedicated to this type of data called functional data analysis. There are a variety of techniques to handle unequally spaced data. A lot of those techniques are now available in JMP through the Functional Data Explorer platform.

The question became, can functional data methods be combined with traditional MSA methods to perform some type of functional measurement systems analysis? This was the solution that was presented in the older papers that I referenced. This is just going to be a little bit of a review of what they did.

First, a penalized spline model was fit to estimate the part force functions and so there were 10 functions that were estimated, averaged over operator and replicates. Then these functions were subtracted from the original force functions to obtain a set of residual force functions. These residual functions no longer contain any variation due to the part. All of the variation in those residuals were due to the operator and the replicates.

They then fit a random effects model to the residuals to obtain the corresponding variance components from the model. A graphical method was then used to find the smallest part variance to use as an estimate for the part variance component.

This was then used to calculate a type of worst case scenario % gage R&R. This method they worked fairly well. They got results that made sense and the client was happy. But this was just a generalization of a standard MSA with some functional components sprinkled in.

When I looked at this data and when I looked at the problem, I wanted to try to take a more traditional functional approach. I have a background in functional data analysis, and that is what my dissertation was on, specifically functional mixed models. There was a chapter in my dissertation dedicated to estimating and testing the variance components from functional mixed models.

I did that by expanding the functional model using eigen function or basis function expansions and rewriting it as a mixed model and then using known techniques to estimate those variance components. I started to think, could I use the same type of technique?

I don't need a full mixed model. I only have random effects here. Can I create a functional random effects model for the part and operator variance components?

This is what I came up with for a functional MSA across models, since we do have an operator term. For functional models, they're set up a little bit differently because they're all based around the input. In this case, your response, Y sub IJK is the Kth replicate made by the Jth operator on the ith part, but this time at a particular distance, D.

We have a functional mean term, a functional random effect that corresponds to the part, a functional random effect that corresponds to the operator, and a functional random effect that corresponds to the cross term and our error term.

In this method, I subtract the mean term over, and so I'm left with this set of residuals, and that's when I'm going to model. This here represents the eigen function expansion of the functional model. We're going to have capital B eigen functions and sum all of those parts together to create this big long random effects model.

But for one eigen function, what is shown in these brackets is the expansion. For each functional random effect, it's split into two parts. We have a functional part and then just a regular part. The functional part is taken care of by evaluating those eigen functions. Then we have just standard random terms for the part, the operator, and the cross term.

Then what this essentially does is you build this long random effects model, and then you have a number of variance components for each term. For the MSA, there will now be three sets of capital B variance components. There's going to be capital B part variance components, capital B operator variance components, and capital B cross term variance components.

Because of the way eigen functions are structured, they are known to be independent and so we can assume that based on how we structured the model, all of these variance components are actually independent from each other also. That means we can sum them together to obtain these functional variance components.

Since we're just summing them together, we can also substitute them into the formula for the % gage R&R and compute that just like we did in the standard models.

How do I actually do this in JMP? Well, it's a multi step process. I'm going to briefly outline it here, and then I'm going to do a demo for you. First, I'm going to estimate the mean curve in FDE and obtain the residual curves. I'm then going to model those residual curves serves also an FDE to obtain the eigen functions needed for the eigen function expansion.

I'm going to save those eigen functions to the original data table and use them in FitMix. Using FitMix, I'm going to fit a random effects model to the original data using nesting and the eigen functions to define the appropriate model specifications.

Hopefully, that all makes a little bit more sense once I demo it. We're going to exit out of here. Here is our data, and we have a column for the ID variable, a column for the part variable that defines the 10 parts, the operator, which defines the five operators, our distance column, and our force column.

Just as a reminder, this is what the data looked like. My first step is to estimate the mean function of the force curves and then use that to obtain some residuals. To do that, I'm going to model the force functions in FDE. I'm going to go to the Analyze menu, Specialize Modeling and select Functional Data Explorer.

I'm going to define force as my output, distance as my input. Then because I want the mean function averaged over all of the IDs, I'm not going to specify an ID function here. I'm going to click Okay. We have our basic intro FDE report. But I want to fit a model, so I can go to the red triangle menu, models.

Technically, you could fit any of those models. I just chose a B spline because it's first, it's easy, and it'll just take a few seconds to run here. Okay, so here's our model fit. There's a red line here that's pretty hard to see, but that is what is the mean function, the estimated mean function. I'm going to give you a better picture of that in a minute or so.

But I actually want to save the functions for this meme. I can do that by going to this Function Summaries report. I can click the red triangle menu and select Customize Function Summaries. I only want the same formulas, so I'm going to deselect them all and then reselect that one and click OK and Save.

I get a new data table with what appears to be this lonely little entry here. There is a hidden column, so I'm going to unhide that. We have a distance column and then we have this force mean functional formula. I'm going to take a look at that, what that actually looks like.

When we look at the formula column, we can see that this is a function of distance. For any value of distance, this is going to be evaluated to give what the mean function is at that distance. This formula column can be put into any data table that also contains a distance column. That's exactly what we're going to do.

We're going to make sure this is highlighted. We're going to right click and select copy column properties. Then we're going to find our way back to our original data table. Double click to create a new column. I'm going to right click here and do paste column properties. Now we have the mean force evaluated at every level, every distance value in our data table.

We can use that to now find our residual function. I'm going to double click again to create a new column and title it Force Resids. Now I'm going to create my own formula column that is simply going to be force minus the mean force. I'm going to click OK. Now we have our set of residuals and this is what it looks like.

In the top graph, these light gray curves are the original functions from the force column. This red line is the same red line that I tried to show you in the FDE report that was hard to see, but this is what that was. That's the mean function.

Then the bottom graph in green shows the residual curves. These are the curves that I'm going to use to proceed with my analysis. My next step is going to be to model the residual curves using FDE to obtain the eigen functions that I need for the model expansion. I'm going to go to the Analyze menu again, select Specialized Modeling, Functional Data Explorer.

This time I'm going to specify the residuals as the output, distance as the input, and I am going to specify my ID column this time. I'm going to click OK. We have our initial port here, and now I want to fit a model to this data. I go to the red triangle menu to look at the models.

Again, technically, you can fit any of these models. In my experimentation, I found that these top three, they took a really long time to fit, and they didn't provide super great fits for what I needed. The wavelets models and the direct functional PCA were much, much quicker and while also providing better fits.

The caveat with those two models is that they require the data to be on an evenly spaced grid. A s I mentioned, when I introduced the data, that's not the case. However, in FDE, we have some data processing steps that help us manipulate our data a little bit. We can go to the clean up, reduce, and this first tab is what we want, and we can use that.

Now that just puts it so that every distance value has an observation, has a force residuals observation. Now we can go ahead and fit one of those models. I just chose direct functional PCA. A s you can see, it was very quick. The fitting was super fast.

This functional PCA report is where we're going to get all of our information that we need. But I was just going to scroll down to look at the data fit a little bit. We can see that these look pretty good. Then if we look at the diagnostic plots, these are on the diagonal, the residuals look good. This looks like a pretty good fit, and I can use this information from the Functional PCA.

In the Functional PCA report, we have a table of eigen values and then these graphs of our shape functions. The shape functions are actually our eigen functions. They're just called shape functions in JMP. How these functions work is that your original input, so distance, is on the X axis, and then the eigen function evaluation is on the Y axis.

For any distance D, you're going to have an evaluation at eigen function 1 , an evaluation at eigen function 2, and so on. You can use the eigen functions to get... You can use a linear combination of the eigen functions to get an estimate of the original functions.

The eigenvalues table gives you an idea of how much % of the overall data variation you're taking into account when you use a certain number of eigenvalue, eigen function pairings. In this case, the first eigenvalue and eigen function pairing actually accounts for 99.9 % of the total variation in the data, which is actually pretty incredible.

This is important in determining how many eigen functions you're going to use for the basis expansion. Typically when you're selecting a number of eigen functions to use, you don't actually want to use all of the eigen functions that you're given.

You want to use the least number of eigen functions that still account for an adequate amount of variation in the data. This is because the more eigen functions you use, the bigger your random effects model is going to be, and it's going to be harder to estimate.

What does accounting for an adequate amount of variation in the data mean? It can mean different things to different people in different fields. Typically, when I'm working with this, I have used in the past about 90 % as a cut off. If I was just doing this analysis to do the analysis, I might only take this first eigen function since it explains so much, and run with that.

For demonstration purposes, I'm going to take the first two just so you can see what the model expansion looks like using two eigen functions. To save these, I'm going to go to this Function Summaries report again, click on the red triangle menu and select Customize Function Summaries.

I just mentioned that I'm only going to save two of them, so I want to enter two here. I'm going to deselect these all again and only save the formulas. I can click OK and Save and I have another new data table now. The things that I need are actually hidden. We want to look for these force resids shape functions which represent our eigen functions.

We can unhide those so that they're now included in our data table. We can take a look at these formula columns. Just like with our mean function, this is simply a function of distance. For any value of distance, these formulas are going to give you what the eigen function value is at that distance.

Also, like the main function, we can put these formula columns into any data table that also contains a distance column. That's what we're going to do again. We're going to put these formula columns into our original data table.

We're going to make sure both of these are highlighted and right click and select copy column properties. Again, find our way back to our original data table. I want to add two new columns to my data table. I'm going to go to calls, new columns. We're just going to title them E1 and 2 to represent the eigen functions. I want to add two columns and I want to add them as a group. I'll click okay.

Then with these two new columns highlighted, I'm going to right click and select paste column properties. Now we have our eigen functions evaluated for every distance.

Just to give you an idea, this is what they look like and it's the same graph. It's almost the same graph as what was in the FDE report. We're just taking what was graphically there and now we have the numbers for everything, for every value here.

Now we want to do the eigen function expansion and expand our functional model. But what does that model actually look like when you have two eigen functions? I'm going to hop back over to my slides real quick and show you what the model expansion looks like when capital B equals 2.

I have this divided into a section for the part, the operator, and the cross term. Then within each of these sections, we see that we have a term that involves eigen function one and a term that involves eigen function two. Essentially, this means that we're going to have two variance components for part, two variance components for the operator, and two variance components for the cross term.

Now I'm going to go back to my data table. I want to fit this model using Fit Model. I'm going to go to the Analyze menu and select Fit Model. I want to specify my personality as the mixed model. Now I'm going to specify the residuals as my Y. Then I'm going to move down to the effects section.

In the fixed effects tab, I don't have any fixed effects and I also don't have an intercept because I mean term over originally. Now I'm going to move to the random effects tab. Here is where I'm going to use the eigen functions and the part and operator variables and nest them in an appropriate way to define the model that I just showed you.

We can add both of these eigen functions and we're going to select these and also select part. We're going to nest part in each eigen function. Then we're going to do the same thing for operator and the same thing for the cross term. That's how we're going to define our model.

The last thing I want to do in this launch window is deselect the Unbounded Variance Components option. When this is selected, it means that you can have negative estimates or variance components, and I don't want that.

Any negative estimates are just going to be set to zero. Now I can run this, and we have our report here. This is the table that's going to give us our estimates that we need to calculate the % gage R&R, but I'm going to poke around the report a little bit first.

This actual by predicted plot looks really weird at first, but it makes sense when you think about it. Since we don't have any fixed effects or an intercept, when we don't take the random effects into account, our estimate for everything is just zero. When we do take those random effects into account, we see that the actual by conditional predicted plot looks a lot better and that these observations fall pretty well along this diagonal.

We can also take a look at the conditional residual plots, and we can see that they're pretty small and they're centered around zero. We do have some deviation from this line here, but there's nothing super crazy about the residuals. I feel okay about using these estimates to calculate the percdent gage R&R. I actually pulled this table and put it back into my slides. I'm going to go back to my slides for the remainder of the presentation.

Here's that data table, not data table, the report table that was just there. As you can see, you have a variance component, S2 variance, components, estimates for part, two for operator, and two for the cross term. A s I mentioned when I was describing the model, we can sum these together to calculate the functional variance components.

These specific numbers aren't as important in this analysis as what you get when you put them into the formula for the % gage R&R. So when I do that, we get a % gage R&R of 3.3030. That is what Baren team defines as an acceptable measurement system. If this had been my project, I would have gone back to the client and said that it seems like your measurement system is accurate, you can go ahead and proceed with your design of experiments.

That's basically it for this analysis for this particular data. Just some thoughts that I had. This result was actually very similar to the worst case scenario % gage R&R that was presented in the 2019 JSM presentation. It was higher by just a few decimal places. It'd be really interesting to compare these methods and other data sets to see if they are always similar or if this was just a happy coincidence.

I don't have much experience at all with measurement systems, and so I don't have any other data to play around with or even really know how to obtain it. If anybody has any data that they think might apply to this type of project, any functional data that also they might want to do an MSA on, I'd be really interested in hearing about it.

For some future work, some thoughts that I had was, the first one was, should I add a functional random effect for the ID? This is very commonly done in a lot of functional mixed models, at least in the fields that I worked in. This was a big contribution in my dissertation was the use of this functional random effect for ID.

Typically, this captures the within function correlation across, in this case, distance. I played around with this random effect in this data, and every model I used, the number of eigen functions I used, it didn't matter. The variance component associated with this random effect always came out to be zero, and that's not useful.

I think in this case, once you took into account the variance from the part and the operator, there just wasn't any variation left to account for. I don't know if that's true for all functional MSA studies or if this was just true for this particular data. If I was ever able to get my hands on some different data, this is definitely something I would keep in mind to see if it could be added to a model in any other data sets more successfully.

I also think it would be cool if we could calculate a confidence interval for the % gage R&R. And finally, I wanted to talk about the one thing I wasn't super happy with in this project, which was the residuals. What was wrong with them?

These are graphs of some different models and the residuals for each one. I'm going to go back and forth between this slide and the next one. So yes, the residuals are relatively small. They're centered around zero. There's no crazy spikes or outliers, and that's good. That's what we want.

However, in all the models I fit, I just still didn't love how they looked across distance. Looking at the residuals this way is especially important when working with functional data. This is because it can really show when you're not capturing all the functional parts of the data.

A lot of times in functional data, you see this fanning effect where the residuals are really good in the beginning, and then as you get towards the end of your domain, they fan out a little bit. This data actually had almost the opposite problem. We can see that the residuals are a little bit wider in the beginning of the domain and get closer to zero as distance gets larger.

There's also definitely some type of cyclical pattern in these residuals. I don't think it's the end of the world. I think they're super centered around zero, but you can see in these graphs that there's clearly some up and down patterns. Essentially what that means is that I'm missing something. I'm not capturing the full functional nature of the data, and I don't really know why yet.

I'd really like to figure that out and fit an even better model, whether that's possible with this data or different data in the future. I'm not sure, but it's definitely something I want to spend a little more time on, and I'd be open to any discussion anyone would like to have about it.

That's it for me. Thanks for watching. If you have any questions, suggestions, questions or feedback, feel free to email me. Thank you.

martindemel · ‎03-28-2023

Here's the presentation from 2020 Discovery Summit Colleen refers to: Measurement Systems Analysis for Curve Data (@Astrid_Ruck , Autoliv)