cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Simon_Italy
Level I

Goodness of Fit

Dear Community,

I have one question on the Goodness of Fit Test.

I created the data distribution.

I launch the Goodness of Fit Test and the result (for example) is: Simulated p-value = 0.1440

I change the maximum (or the minimum) in the X-axis... and re-launch the Goodness of Fif... and the Simulated p-value is different.

For me it's strange... because I do not understand how the only change of X-scale can impact the the calculation of Goodness of fit.

Thanks for your feedback.

Simone

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Goodness of Fit

The key is the word 'simulated.' In some cases, a closed-form solution or approximation is unavailable for a quantity such as this p-value. In such cases, we use a simulation to derive the p-value. The simulation is based on a large number of random samples. These samples are not reproducible, but the result, if the number is large, is reasonably stable.

View solution in original post

8 REPLIES 8

Re: Goodness of Fit

The key is the word 'simulated.' In some cases, a closed-form solution or approximation is unavailable for a quantity such as this p-value. In such cases, we use a simulation to derive the p-value. The simulation is based on a large number of random samples. These samples are not reproducible, but the result, if the number is large, is reasonably stable.

dlehman1
Level V

Re: Goodness of Fit

I hope you can expand on your response.  I was going to say:  I haven't used the goodness of fit option but I have often fit distributions to data using other software.  I would expect the fit to change if you fix some parameters rather than just running the goodness of fit test with no parameter restrictions.  Once you restrict the parameters, I would expect the goodness of fit to be reduced since you have introduced a constraint that was not in the original goodness of fit.  For example, if you allow the mean in a normal distribution to be fit to the data and then compare it to a fit where you restrict the mean to a particular value, I would think that the fit can only become worse, not better.  Please explain some more about this.

 

I was going to add one comment to the original task.  If I have a lot of data, I almost always find that you can reject the hypothesis that the data came from the specified distribution.  So, I rarely find the hypothesis test of interest.  I find the visualization of the fits far more informative.  Often the fit looks quite close even though the p-value is often <.0001.  The question cited a p value of 0.144 so clearly that data is different than what I have worked with - either simulated data from a known distribution, or a more well behaved data generating process.  I would imagine that the visualization looks very close as well.  In such a case, fixing the parameters at values other than what was fit should reduce this p-value (worsen the fit) substantially, according to how I am thinking.  Is that correct?

MRB3855
Super User

Re: Goodness of Fit

Hi @dlehman1 /; Good questions.

(1) So, I rarely find the hypothesis test of interest. I find the visualization of the fits far more informative. Often the fit looks quite close even though the p-value is often <.0001. The question cited a p value of 0.144 so clearly that data is different than what I have worked with - either simulated data from a known distribution, or a more well behaved data generating process.

     Yup, this is very common. You can get into an "overpowered" situation when your sample size is large; in this case, small     

     (and perhaps negligible) departures from the distribution will result in rejecting the distribution.  I too, look at plots etc. to   

     assess any meaningful departure from the hypothesized distribution. 

(2) I would expect the fit to change if you fix some parameters rather than just running the goodness of fit test with no parameter restrictions.  Once you restrict the parameters, I would expect the goodness of fit to be reduced since you have introduced a constraint that was not in the original goodness of fit.  For example, if you allow the mean in a normal distribution to be fit to the data and then compare it to a fit where you restrict the mean to a particular value, I would think that the fit can only become worse, not better.  Please explain some more about this.

     In general, I'd expect the fit to be worse and that the GOF test would reflect this. Can you provide an example of   

     where this is not the case?  

 

dlehman1
Level V

Re: Goodness of Fit

I don't understand your question (2).  It sounds like we are in agreement:  restricting the parameters should result in the fit being worse.  That is what I said and sounds like you are saying.

MRB3855
Super User

Re: Goodness of Fit

Hi @dlehman1 : Yes, we are in agreement! I was just wondering if you have an example where this is not the case? 

dlehman1
Level V

Re: Goodness of Fit

I don't know of any such cases - but it seems to me that they should not exist.  Since the initial unrestricted fit minimizes some loss function, I would think that a restricted fit can do no better - unless the algorithm for the initial fit was suboptimal.  For example, if a local, but not global, solution was found then perhaps restricting the parameters to a different part of the decision space might find a better solution.  But conceptually, it seems to me that a restricted fit should never be better than an unrestricted fit.  Does that make sense?

MRB3855
Super User

Re: Goodness of Fit

Hi @dlehman1 : Agreed. I suppose there could be some caveats:

 - Parameter estimation method. MoM (Method of Moments), MLE, etc.  For the sake of discussion, say we assume a normal distribution. The MLE (and MoM estimate) for s is different (smaller than) than the unbiased estimate that is "typically" used.  Which estimate of s (given m is estimated by the sample mean) would result in a better "fit" w.r.t. a GOF test?  I'm not sure. And I recognize, that in this example, both estimators are consistent and for large n the difference between them is vanishingly small.

- If a simulation is used to assess GOF, I suppose there could be a non-zero probability that, in certain situations, our intuition here seems to fail us (via simulation error).  But to my way of thinking, that seems very remote. 

 

Loads of food for thought here! 

Simon_Italy
Level I

Re: Goodness of Fit

Dear Mark,

thanks and sorry for my late feedback... I had some issue in the registration to the Community.

Now is all ok.

Thanks again for your feedback.

Best regards,

Simone