Discussions

Dani · Oct 11, 2024 03:45 AM

Hi,

While using JMP's "Fit Y by X" platform to fit data with a quadratic model, I observed that the intercept obtained using JMP’s centered polynomial notation for a parabola differs from the intercept obtained using the standard (uncentered) ( y = ax^2 + bx + c ) format. Based on my understanding, the intercept represents the value of ( Y ) when ( x = 0 ), and I don't see why these values should differ according to the notation used.

I have attached an example using the equation ( y = 203x^2 - 3x + 1000 ). The uncentered model correctly indicates that the intercept is 1000, whereas the centered notation shows the value -125875, a significantly different result.

I have tried to find an explanation for this discrepancy in JMP's help documentation and within the community forums, but I have been unsuccessful so far. Therefore, I would appreciate it if someone could clarify what I might be missing here.

Many thanks,

Paulo

MRB3855 · Oct 16, 2024 7:36 AM

Hi @Dani : You raise some good issues. I'll try to respond in very general terms so there may be cases where what I write below doesn't hold. That said, in general (and in no particular order).

(1). If the value of the intercept (Y[0] ) is important to you, then for a quick and easy way to see it, then yes avoid using the centered polynomial (as you say). However, be very careful here. For example: say the range of X for your data is from -40 to -10. Then predicting Y[0] is extrapolating (predicting outside the range of data that the model was built on). This can be very problematic for polynomials. So, only do this if you have some scientific understanding that the model holds beyond your data. Now, if your data is -49 to -1, then Y[0] is still an extrapolation but the extrapolation is less so it may not be as problematic.

(2). In polynomial regression, the only p-value that matters is the one associated wit the highest order term; the others are just along for the ride. If the highest order term (the squared term in your example) is "significant" you are done. If not, remove that term and refit the model.

(3). That said, all the p-values are testing whether of not that parameter equals zero or not. For the centered model, the "intercept" and X parameter don't have a interpretation that is easily intuited. One thing you will notice though, is that when X is at its mean (-25), then the mean of Y is -125875 - 10,153*[-25] = 127950. And, notice the the p-values for the squared term are the same for both models because they are testing the same thing (X^2 coefficient = 0, or not).

(4). For both models, and from a model matrix point of view, the intercepts are there to ensure a least squares solution. And, as I say in 1, the intercept is easily interpreted for the uncentered model. In the centered model, however, it doesn't have an easy interpretation; but, the "intercept" is necessary for a least squares solution so I'd just smile and let it go along for the ride.

View solution in original post

MRB3855 · Oct 11, 2024 1:54 AM

Hi @Dani. You still need to add 25^2*203 to get the centered one to match the other one. I.e., the equations are exactly the same if you expand carefully.

Dani · Oct 11, 2024 05:16 AM

Thanks, MRB3855.

To further clarify, I understand that the centered and uncentered quadratic equations are mathematically equivalent. However, my concern lies with the "Intercept" term in JMP's "Parameter Estimates" table. The term "Intercept" should represent Y(0), but it is showing a different value, which I find misleading.

Why is the Intercept not the same in both cases? Should I then always calculate the correct intercept separately when using the centered notation? What about the other associated statistics (e.g., standard error, t-ratio, etc.)? Should they also be calculated separately?

If so, I find this rather strange. Therefore, my question is: Am I overlooking something?

MRB3855 · Oct 11, 2024 05:20 AM

Hi @Dani. It’s really just semantics. The “intercept” only has that meaning (y when x=0) for the uncentered model. Otherwise, it is just a constant. So it’s not really an “intercept” in the centered model, though it is labeled as such.

Victor_G · Oct 11, 2024 05:27 AM

Hi @Dani,

As @MRB3855 has explained, in the second case, a part of the intercept is contained in the term 203x(X-25)^2 : 203x25^2 = 126875.

In the second case, if you add 126875 to the intercept in the equation (-125875), you find the intercept from the original (first) equation, 1000.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

MRB3855 · Oct 11, 2024 05:30 AM

Hi @Dani. And if you expand the centered one and carefully combine standard errors and covariances, all p values etc will match. But, as they are now, they will be different because they are testing different things.

Dani · Oct 11, 2024 06:00 AM

Thank you both for your input. To summarize, if the actual intercept for Y(0) is important - as it is the case in my line of work - I should avoid using the centered polynomial notation.

@MRB3855, when you mention that each notation tests a different aspect, could you please provide more information or references on what specifically each approach is testing?

MRB3855 · Oct 16, 2024 7:36 AM

Hi @Dani : You raise some good issues. I'll try to respond in very general terms so there may be cases where what I write below doesn't hold. That said, in general (and in no particular order).

(1). If the value of the intercept (Y[0] ) is important to you, then for a quick and easy way to see it, then yes avoid using the centered polynomial (as you say). However, be very careful here. For example: say the range of X for your data is from -40 to -10. Then predicting Y[0] is extrapolating (predicting outside the range of data that the model was built on). This can be very problematic for polynomials. So, only do this if you have some scientific understanding that the model holds beyond your data. Now, if your data is -49 to -1, then Y[0] is still an extrapolation but the extrapolation is less so it may not be as problematic.

(2). In polynomial regression, the only p-value that matters is the one associated wit the highest order term; the others are just along for the ride. If the highest order term (the squared term in your example) is "significant" you are done. If not, remove that term and refit the model.

(3). That said, all the p-values are testing whether of not that parameter equals zero or not. For the centered model, the "intercept" and X parameter don't have a interpretation that is easily intuited. One thing you will notice though, is that when X is at its mean (-25), then the mean of Y is -125875 - 10,153*[-25] = 127950. And, notice the the p-values for the squared term are the same for both models because they are testing the same thing (X^2 coefficient = 0, or not).

(4). For both models, and from a model matrix point of view, the intercepts are there to ensure a least squares solution. And, as I say in 1, the intercept is easily interpreted for the uncentered model. In the centered model, however, it doesn't have an easy interpretation; but, the "intercept" is necessary for a least squares solution so I'd just smile and let it go along for the ride.

Dani · Oct 14, 2024 07:35 AM

MRB3855, many thanks for your detailed answer. This is the information I was looking for. I hope it is available in the JMP help files, and that I simply overlooked it.

Victor_G · Oct 14, 2024 08:52 AM

Just to add to the excellent points made by @MRB3855, centering factors help reduce multicollinearity in the presence of interaction terms or polynomial terms in the model, which could make the terms coefficients more complex and less precise to estimate (and could lead to differences in statistical significance evaluation).

Related discussions about this centering effect :
Stepwise model question
Differences in parameter estimates using same multiple regression analysis.
The scaling of the effect is common on factors data from DoE, in order to make the relative factors importance easier to compare :
What my factors are divided for?

Hope this complementary answer may help you as well,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Discussions

Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Re: Intercept of a parabola

Recommended Articles