LOESS (locally optimized enhanced sum of squares aka locally weighted enh....)
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
The idea is as follows. A bit intricate, but GENERALLY applicable. Your measuring device ALLEGEDLY only measures property A (magnetic field, pH, doesn't matter). It responds, ALLEGEDLY to ONLY property A. However, you, the keen metrologist, realize this is a fallacy. Actually your measuring device, unfortunately, also is affected by property T (typically temperature). Allegedly not. But your RESEARCH shows that your device TRULY is a function of both property A AND property T. You've done research to establish the T effect -- quantitatively. That's all been done and is the background history that is required. Now, HENCEFORTH & FOREVERMORE Thou Shalt measure property T whenever using the device to measure property A. The measuring of property T can be considered a "subsidiary measurement". Now you have yourself some 20 or so samples -- each one needing measurement of their property A's value. You do so, but either for every single one of the 20 sample or perhaps every 2 or every 4 you MEASURE AS WELL PROPERTY T. From your research on the very topic you know how to correct the systematic error in the measurement's report of property A values for each sample given the subsidiary measurement of property T and THEIR values. EVEN IF you have a T measurement (think temperature) for each sample do you REALLY THINK the "BOUNCY" T values are as close to the truth as you can get. They could be closer to the truth than the alternative I'm about to present, but NOT likely. Just as measurement #2 MIGHT be closer to the truth than an average of 6 true replicate measurements -- but is NOT LIKELY. What's the alternative to the "BOUNCY" T values? Time-trended or smoothed T values. Especially for temperature. We don't expect a time trend of temperature to bounce up and down. So the smoothed trend is LIKELIER closer to the truth than the OBSERVED BOUNCY T values. Yes, that does mean that you have also elapsed time to measure to smooth the T observed values. Time is usually unproblematic. But even merely smoothing by "sequence number" probably gets you closer to the truth about T (think temperature) than taking as-is bouncy values. And it's NOT an issue of "make pretty": The issue is using AS CLOSE TO THE TRUTH T VALUES to EFFECT A MOST ACCURATE SYSTEMATIC ERROR COMPENSATION/CORRECTION AS POSSIBLE. You need that historical research to know how T affects the instrument report of property A -- allegedly, but NOT actually, "all the instrument cares about". But once that research is done, you ALSO need the best T values you can get. Those are obtained with SMOOTHING. Smoothing is BEST DONE, NOT with spline (too wiggly) or the horrendously bad moving average, but with LOESS. Locally Enhanced Sum of Squares (or Locally Weighted Enhanced Sum of Squares). It's only real rival is Distance-Weighted Least Squares (offered in Statistica). LOESS is in S+ (and probably the freeware version R). MOREOVER, if you didn't get a T value for EACH AND EVERY SAMPLE, you'll need to LOESS-Interpolate (that is, use the LOESS regression to estimate T values for sample for which [for whatever reason] you didn't measure T) BECAUSE EACH AND EVERY MEASUREMENT FOR PROPERTY A, IN TRUTH, REQUIRES A T VALUE TO FIX THE SYSTEMATIC ERROR (based on that research). Sorry about a couple things: my inductive writing -- proceeding step by itty bitty step all leading to the punchline (so, if you've made it, go back to the top and see the indefatigable logic) and secondly, the somewhat lack of concreteness. The latter is not really too bad, but I didn't want to get very very specific because you'd get the wrong notion. TRENDING (SMOOTHING & INTERPOLATING [if need be]) OF SUBSIDIARY MEASUREMENT VALUES TO GET CLOSER TO THE TRUTH FOR THOSE VALUES FOR THE SAKE OF FIXING SYSTEMATIC ERROR TO AN ENTIRELY DIFFERENT MEASUREMENT IS A GENERALLY APPLICABLE METROLOGICAL STASTICS PRINCIPLE -- yet it's one I've seen discussed precisely NOWHERE other than my own writings. LOESS is the way to do the trending/smoothing/interploation; runner up is DWLS (distance-weighted least squares [in Statistica]). Finally, LOESS, as S+ -implemented is VERY robust against outliers and you indeed get closest to the truth via LOESS. So you'd make the most ACCURATE compensation/correction to property A reportings.