Discussions

AnnaPaula · Jun 8, 2023 5:31 PM

Hi,

I am using Generalized Regression model to perform analysis in my data, with Binomial as response distribution (because my response values - ErrorAT - are either 0 or 1). I am using Lasso as the estimation method.

I opted to used generalized regression due to the large correlation among some factors.

I have a quick question related to this model: one of the main factors (seedAT) was removed from the model. However, when I used Tukey multiple comparison test on the factor, all the three levels of the factor (seedAT) are statistically significantly different among themselves. If they are statistically significantly different among themselves, shouldn't the factor be significant too?

Could someone shed some light on why the levels of the factor are statistically significantly different but the factor is still not significant to the model?

Thank you!

statman · Apr 11, 2021 10:35 AM

Statistical significance is a conditional statement. It depends on what is being compared, what is in the model and what is not, what constitutes the error terms, what is in the inference space.

"All models are wrong, some are useful" G.E.P. Box

AnnaPaula · Apr 11, 2021 02:38 PM

Right, but the comparison here is the same. So, if the levels of the factor are significantly different in the model, shouldn't the factor itself be significant for the model?

dale_lehman · Apr 11, 2021 03:13 PM

I'm not understanding your model. Since seedAT is in some cross effects that are significant, it is only the direct effect that is not showing significance. While this seems unusual, it suggests to me that the variable's significance lies in its interaction with other variables - once those are accounted for, the effects of seedAT have been incorporated. It appears like seedAT was removed from the model - that is not insignificance, but an indication that the model is redundant (I believe - I have little experience with Generalized regression, so I might be wrong).

statman · Apr 12, 2021 11:59 AM

It would be extremely helpful to understand the data set and how you acquired the data, but I understand you may not want this posted in a public forum. This does make it challenging to know what exactly you did or are trying to do. I don't necessarily agree with you that the comparisons are the same. One comparison is the factor (seedAT) with the other factors, interactions and error in the model. When looking at the level differences in the levels of seedAT, you are comparing within level to between level. There are no factors in this comparison and the basis for comparison is the within level variation (not the unassigned DF's of the model, error).

I'm not sure about your model, but it appears you removed the 1st order effect and kept 2nd order effects. This does not follow the principle of hierarchy when model building? Somewhat to Dale's point.

One additional comment: I do not have any idea of what the differences are in practical terms. Don't forget, practical significance is always more important than statistical significance.

"All models are wrong, some are useful" G.E.P. Box

AnnaPaula · Apr 17, 2021 04:03 PM

@dale_lehman and @statman thank you for the answers

Regarding the dataset, these data I obtained running multiple simulation experiments (1620).

In each experiment, I considered the following factors: arrival time (IA_scenario), seed of arrival time (seedAT), service time (ST_scenario), seed of service time (ST), number of replications (NRepsNom), and queue model (M/M/1 or M/M/inf based on Kendall's notation from queue theory). The factors were varied according to full factorial. Because I varied arrival time and service time and this is a queueing model, consequently, the traffic intensity of the model also varied (TINom).

In this simulation model, I collected the following 4 responses: average of arrival time (AT), average of service time (ST), average of number of units in the system (NIS), and average time in the system (TIS). Based on this simulation result and the theoretical result from queue theory, I calculated the error of AT, for instance.

AT is the simplest case (it is the arrival that dictates the queue system and is not dependent on ST). Therefore, when I built the regression model, I did not include the factors related to service time in the regression model. I included all other factors.

I think this is the simplest way I can describe the dataset.

Now going back to the discussion.

- Regarding the comparison. In the effect tests, the factor (SeedAT) is not being compared to other factors, right? As far as I know, we are testing the significance of the factor to the model and that's it. So, in the effect test, it is a question of whether SeedAT is statistically significant or not for the model. Comparison of the factor with other factors and interactions are performed in the multiple comparison test.

And if this is correct, my understanding was that if all three levels of the factor are statistically significantly different, there would be no question that the factor is statistically significant. But I think my understanding was all wrong. If someone could point me to any link or material to read about it, I would really appreciate it. As the ones I have found did not clarify my question and understanding so far.

- Regarding the 1st order effect vs 2nd order effect. I did not remove the 1st order effect, but the generalized model is telling me that "it is not significant", while the second order effects are. Which is related to my question. Because as you both pointed, this does not follow the hierarchy principle.

But how then can I tell JMP to forcefully include the first-order effect SeedAT in the model? Because when I generate the regression equation, the effect will not be included in the equation.

The practical significance is a good reminder. Thank you for that!

Thank you again for the discussion and insights.

AnnaPaula · Apr 17, 2021 04:41 PM

For the question to forcefully include the 1st order effect, I just realize the option is quickly available in the "Advance Controls" of Generalized Regression.

Discussions

Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Re: Factor not statistical significant in Effect Tests although there is statistical significance difference among factor levels from Tukey Multiple Comparison

Recommended Articles