Re: Multiple Linear Regression & Lack of Fit

DEDuvall · Jun 10, 2023 1:37 PM

I occasionally have a need to perform multiple linear regression analysis calculations on accelerated testing of plastic piping. The fitted regression equations are used to extrapolate time to failure results to end use temperature and internal pressure conditions that are well outside of those employed in the accelerated testing. In doing this, it is desirable to perform a lack of fit test on the resultant regression equation to assess whether extrapolation of this type is justified.

For the lack of fit test the residual sum of squares SS(E) has to be broken down into a “lack of fit” sum of squares and a “pure error” sum of squares. These two, when summed, equal the residual sum of squares. The pure error sum of squares is derived from data obtained on replicate tests under the same conditions (i.e. with the same independent variables employed). The Lack of Fit SS is obtained by subtraction. The mean squares are then calculated form the sums of squares and the appropriate degrees of freedom for each. The F ratio for the test is then = Lack of Fit Mean Square divided by the Pure Error Mean Square. The Null Hypothesis for this test is that the fit to the data is adequate. If the calculated F ratio is less than the critical value of F obtained from a table of such, the Null Hypothesis is accepted and one concludes that the regression equation with the fitted parameters adequately represents the data beyond the test conditions. If the F ratio exceeds the critical value from the table, the Null Hypothesis is rejected and one is supposed to go back to square one and try a different regression equation with the data in hand, at least for the purpose of extrapolating to conditions outside the boundaries of those employed in the accelerated test of interest

My problem so far has been that the regression analysis package we have used only provides regression sum of squares and SSE results, it doesn’t break down SSE into its’ pure error and lack of fit components. Therefore, I have had to do that in a separate Excel spreadsheet in order to perform this particular analysis. My question here is are there statistics packages somewhere which will do that for me in order to speed up the process when I get into this type of calculation.

statman · Jun 22, 2020 12:02 PM

If your question is: Can JMP calculate lack-of-fit SS, the answer is yes.

https://www.jmp.com/support/help/en/15.1/?os=mac&source=application&utm_source=helpmenu&utm_medium=a...

"All models are wrong, some are useful" G.E.P. Box

DEDuvall · Jun 22, 2020 02:52 PM

Thanks for the link. I will check it out.

statman · Jun 22, 2020 12:16 PM

@DEDuvall, your statement "The fitted regression equations are used to extrapolate time to failure results to end use temperature and internal pressure conditions that are well outside of those employed in the accelerated testing." is a bit confusing. You are saying your accelerated testing does not cover the conditions you are trying to draw conclusions about? Typically, in accelerated testing, aren't the conditions exaggerated to simulate long-term conditions? We shake, bake and sprinkle water on the products to expose failures meant to represent failures in time. How we expose those products and to what conditions is typically an educated guess to start and then over time we refine those acceleration models to match reality. TMK, Lack of fit tests don't really answer whether you can extrapolate the results BEYOND the inference space the study was conducted over. They are used to assess whether the model fits the data in hand.

"All models are wrong, some are useful" G.E.P. Box

P_Bartell · Jun 22, 2020 02:45 PM

@DEDuvall I'm going to take a slightly different tack than @statman, although I don't disagree with anything he contributed. Are you using just good old fashioned ordinary least squares regression for your analysis? As in the JMP Standard Least Squares personality in the Fit Model platform. It wasn't 100% clear to me from your original post. If so, you might want to also check out JMP's full complement of reliability and survival data exploration and modeling techniques. Generally speaking modeling time to event systems benefit from using some alternative analytical methods compared to OLS. The full complement of JMP's Reliability and Survival methods can be found here in the JMP online documentation:

https://www.jmp.com/support/help/en/15.1/#page/jmp/introduction-to-reliability-and-survival.shtml#ww...

DEDuvall · Jun 22, 2020 03:12 PM

The test methods I cited are applied to testing of thermoplastic piping, where the material exhibits creep deformation and stress rupture behavior. Tests are performed at a variety of mechanical stress levels well above those that the pipe would see in service and at elevated temperatures in order to accelerate the physical processes occurring in the material from which the pipe or fittings are made. The ASTM F2023 test adds highly chlorinated water as the pressurization medium in the pipes to simulate / accelerate the effect of chlorine disinfectants in potable water on degradation of the pipe material. The time-to-failure data are fitted by multiple linear regression to an equation of the form y = b1 + (b2*x1) + (b3*x2) + b4*x1*x2 where the b's are the coefficients of the regression equation and the x's are transforms of the absolute test temperature and the mechanical stress in the test specimens. The fitted equation is then used to estimate lifetime of the pipe under normal end use temperatures and internal pressures (that create the mechanical stress in the pipe wall material). This approach has been utilized for decades without any consideration for whether the planar response surface represented by the fitted regression equation is actually planar when you get out to the end use temperatures and pressures. This use is addressed in certain texts, e.g. in Draper & Smith, Applied Regression analysis, 2nd Edition, Section 1.5. The plastic pipe industry is making determinations of what products are suitable for long term use based upon this type of analysis without checking to see if the regression equation actually fits the data for that purpose.

Thanks for yuur response and for the link you provided. I will check it out.