cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
Simon_Italy
Level I

Goodness of Fit

Dear Community,

I have one question on the Goodness of Fit Test.

I created the data distribution.

I launch the Goodness of Fit Test and the result (for example) is: Simulated p-value = 0.1440

I change the maximum (or the minimum) in the X-axis... and re-launch the Goodness of Fif... and the Simulated p-value is different.

For me it's strange... because I do not understand how the only change of X-scale can impact the the calculation of Goodness of fit.

Thanks for your feedback.

Simone

2 REPLIES 2

Re: Goodness of Fit

The key is the word 'simulated.' In some cases, a closed-form solution or approximation is unavailable for a quantity such as this p-value. In such cases, we use a simulation to derive the p-value. The simulation is based on a large number of random samples. These samples are not reproducible, but the result, if the number is large, is reasonably stable.

dlehman1
Level V

Re: Goodness of Fit

I hope you can expand on your response.  I was going to say:  I haven't used the goodness of fit option but I have often fit distributions to data using other software.  I would expect the fit to change if you fix some parameters rather than just running the goodness of fit test with no parameter restrictions.  Once you restrict the parameters, I would expect the goodness of fit to be reduced since you have introduced a constraint that was not in the original goodness of fit.  For example, if you allow the mean in a normal distribution to be fit to the data and then compare it to a fit where you restrict the mean to a particular value, I would think that the fit can only become worse, not better.  Please explain some more about this.

 

I was going to add one comment to the original task.  If I have a lot of data, I almost always find that you can reject the hypothesis that the data came from the specified distribution.  So, I rarely find the hypothesis test of interest.  I find the visualization of the fits far more informative.  Often the fit looks quite close even though the p-value is often <.0001.  The question cited a p value of 0.144 so clearly that data is different than what I have worked with - either simulated data from a known distribution, or a more well behaved data generating process.  I would imagine that the visualization looks very close as well.  In such a case, fixing the parameters at values other than what was fit should reduce this p-value (worsen the fit) substantially, according to how I am thinking.  Is that correct?