IMHO, the problems with Gage R&R are less to do with the analysis and more to do with the data collection strategy. Issues include:
- Gage R&R is biased to the measurement system components (multiple layers are components of measurement system sources; e.g., gage and operator with one layer for sample variation for comparison)
- Folks don't recognize the importance of selecting samples for the study. What variation do the samples represent? Since you are comparing the measurement variation to the sample variation, if you choose samples that are similar (e.g., 5 consecutive samples) vs. samples that may vary wildly (e.g., random samples) you can arrive at completely different conclusions about the capability of the measurement system.
- Unfortunately, folks tend to do gage R&R and "christen" the measurement system as good forever. Studies where results are applied beyond the inference space.
- Mixed model (crossed and nested), are difficult to assess stability of the measurement system. Control charts are not appropriate for crossed studies.
Don Wheeler has written some good papers on the subject:
Wheeler, Don (2006) “An Honest Gauge R&R Study”, 2006 ASQ/ASA Fall Technical Conference, No.189
Wheeler, Donald (2020) “Gauge R&R Methods Compared: How do the ANOVA, AIAG, and EMP approaches differ?”, ASQ Statistics Division Newsletter, Vol.39, No.1
"All models are wrong, some are useful" G.E.P. Box