Solved: Goodness of Fit

Simon_Italy · Oct 4, 2024 11:04 AM

Dear Community,

I have one question on the Goodness of Fit Test.

I created the data distribution.

I launch the Goodness of Fit Test and the result (for example) is: Simulated p-value = 0.1440

I change the maximum (or the minimum) in the X-axis... and re-launch the Goodness of Fif... and the Simulated p-value is different.

For me it's strange... because I do not understand how the only change of X-scale can impact the the calculation of Goodness of fit.

Thanks for your feedback.

Simone

Mark_Bailey · Oct 4, 2024 11:11 AM

The key is the word 'simulated.' In some cases, a closed-form solution or approximation is unavailable for a quantity such as this p-value. In such cases, we use a simulation to derive the p-value. The simulation is based on a large number of random samples. These samples are not reproducible, but the result, if the number is large, is reasonably stable.

View solution in original post

Mark_Bailey · Oct 4, 2024 11:11 AM

The key is the word 'simulated.' In some cases, a closed-form solution or approximation is unavailable for a quantity such as this p-value. In such cases, we use a simulation to derive the p-value. The simulation is based on a large number of random samples. These samples are not reproducible, but the result, if the number is large, is reasonably stable.

dlehman1 · Oct 4, 2024 11:19 AM

I hope you can expand on your response. I was going to say: I haven't used the goodness of fit option but I have often fit distributions to data using other software. I would expect the fit to change if you fix some parameters rather than just running the goodness of fit test with no parameter restrictions. Once you restrict the parameters, I would expect the goodness of fit to be reduced since you have introduced a constraint that was not in the original goodness of fit. For example, if you allow the mean in a normal distribution to be fit to the data and then compare it to a fit where you restrict the mean to a particular value, I would think that the fit can only become worse, not better. Please explain some more about this.

I was going to add one comment to the original task. If I have a lot of data, I almost always find that you can reject the hypothesis that the data came from the specified distribution. So, I rarely find the hypothesis test of interest. I find the visualization of the fits far more informative. Often the fit looks quite close even though the p-value is often <.0001. The question cited a p value of 0.144 so clearly that data is different than what I have worked with - either simulated data from a known distribution, or a more well behaved data generating process. I would imagine that the visualization looks very close as well. In such a case, fixing the parameters at values other than what was fit should reduce this p-value (worsen the fit) substantially, according to how I am thinking. Is that correct?

MRB3855 · Oct 7, 2024 07:53 AM

Hi @dlehman1 /; Good questions.

(1) So, I rarely find the hypothesis test of interest. I find the visualization of the fits far more informative. Often the fit looks quite close even though the p-value is often <.0001. The question cited a p value of 0.144 so clearly that data is different than what I have worked with - either simulated data from a known distribution, or a more well behaved data generating process.

Yup, this is very common. You can get into an "overpowered" situation when your sample size is large; in this case, small

(and perhaps negligible) departures from the distribution will result in rejecting the distribution. I too, look at plots etc. to

assess any meaningful departure from the hypothesized distribution.

(2) I would expect the fit to change if you fix some parameters rather than just running the goodness of fit test with no parameter restrictions. Once you restrict the parameters, I would expect the goodness of fit to be reduced since you have introduced a constraint that was not in the original goodness of fit. For example, if you allow the mean in a normal distribution to be fit to the data and then compare it to a fit where you restrict the mean to a particular value, I would think that the fit can only become worse, not better. Please explain some more about this.

In general, I'd expect the fit to be worse and that the GOF test would reflect this. Can you provide an example of

where this is not the case?

dlehman1 · Oct 7, 2024 08:00 AM

I don't understand your question (2). It sounds like we are in agreement: restricting the parameters should result in the fit being worse. That is what I said and sounds like you are saying.

MRB3855 · Oct 7, 2024 08:12 AM

Hi @dlehman1 : Yes, we are in agreement! I was just wondering if you have an example where this is not the case?

dlehman1 · Oct 7, 2024 09:03 AM

I don't know of any such cases - but it seems to me that they should not exist. Since the initial unrestricted fit minimizes some loss function, I would think that a restricted fit can do no better - unless the algorithm for the initial fit was suboptimal. For example, if a local, but not global, solution was found then perhaps restricting the parameters to a different part of the decision space might find a better solution. But conceptually, it seems to me that a restricted fit should never be better than an unrestricted fit. Does that make sense?

MRB3855 · Oct 8, 2024 3:13 AM

Hi @dlehman1 : Agreed. I suppose there could be some caveats:

- Parameter estimation method. MoM (Method of Moments), MLE, etc. For the sake of discussion, say we assume a normal distribution. The MLE (and MoM estimate) for s is different (smaller than) than the unbiased estimate that is "typically" used. Which estimate of s (given m is estimated by the sample mean) would result in a better "fit" w.r.t. a GOF test? I'm not sure. And I recognize, that in this example, both estimators are consistent and for large n the difference between them is vanishingly small.

- If a simulation is used to assess GOF, I suppose there could be a non-zero probability that, in certain situations, our intuition here seems to fail us (via simulation error). But to my way of thinking, that seems very remote.

Loads of food for thought here!

Simon_Italy · Oct 28, 2024 01:02 PM

Dear Mark,

thanks and sorry for my late feedback... I had some issue in the registration to the Community.

Now is all ok.

Thanks again for your feedback.

Best regards,

Simone

Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit

Re: Goodness of Fit