cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Choose Language Hide Translation Bar
Level I

## Tolerance Intervals after Data Transformation

I'm computing Normal Tolerance Intervals and sometimes need to transform the data first. I typical try SHASH and Johnson SL.  My question regards SHASH: What does it mean when the SHASH Distribution doesn't fit but the resulting transformed data are Normal? Conversely, what does it mean when SHASH fits but the resulting transformed data are not Normal?  I am following the guidelines from a standards organization governing my work which allow for a monotonic transformation to Normality so I think that since the first case results in a Normal outcome, I should be able to use SHASH. although I am uncomfortable with not having a good fit. For the second case but I am curious about what it means for SHASH to fit when the result is not Normal.  Thanks.

4 REPLIES 4
Super User

## Re: Tolerance Intervals after Data Transformation

Hi @jayg001 : So I can fully understand of your question(s) and respond appropriately, can you tell me if I'm understanding your two situations.

Given a column of data, X:

1." What does it mean when the SHASH Distribution doesn't fit but the resulting transformed data are Normal?"

So, on this case, X does not have a SHASH distribution and you are applying some transformation (I'll call this function SHASH) to X to get a second column Y.  So, Y = SHASH[X]. Further, Y appears to be normally distributed.  Do I have this correct?  What function (I've called SHASH[X]) are you using to transform X? I am familiar with the SHASH distribution, but I'm just looking for clarity here.

2.  "Conversely, what does it mean when SHASH fits but the resulting transformed data are not Normal?"

In this case, X has a SHASH distribution. Then, a applying the SHASH transform (as in 1 above), you have Y = SHASH[X].  And further, Y does not appear to have a normal distribution.

Do I have this correct?

Level I

## Re: Tolerance Intervals after Data Transformation

Yes, you are correct.  I am saving the transformed values after fitting SHASH in the Distribution platform. The saved values formula contains the function SHASHTrans(x, gamma, delta, theta, sigma) using the parameters from the fitted SHASH distribution.

Super User

## Re: Tolerance Intervals after Data Transformation

Hi @jayg001 : There are a few things that come in to play here. In no particular order:

1. Sample Size:

-For small sample sizes, it is less likely that you will reject a plausible distribution via some Goodness-of-Fit test even when

the distribution is not correct (i.e., low  power). This results in accepting the wrong distribution.

-Conversely, for large sample sizes you may reject a distribution because it is overpowered.  i.e., even  negligible, departures

from the distribution being tested will result in rejecting the distribution. This results in rejecting a  distribution  that is, for all

2. We never really know if the distribution is Normal, or SHASH, or any other distribution. The Good-of-Fit test may not reject a

given distribution, but that doesn't prove the distribution is correct; it only means that there is not sufficient evidence to reject it.

It's like how the assumption or innocence is applied in a trial. "Not guilty"  does not mean innocent. Not guilty means there was

not enough evidence to convict...

3. A transformation can change how much influence an individual point(s) has on Goodness-of-Fit tests.

FWIW Edit: Generally speaking, I'm not a big fan of Goodness-of-Fit tests (see above for why). I tend to look at QQ plots etc.

These are my initial thoughts. I hope they are helpful.

Level I

## Re: Tolerance Intervals after Data Transformation

Thanks. I'm also not a fan of goodness-of-fit because I'm a dyed-in-the-wool Bayesian, but I need to develop a procedure for a large number of engineers with scant statistical training so I'm stuck with trying to keep it simple. JMP Help says that SHASHTrans transforms a SHASH-distributed variable to a Standard Normal variable so I probably will require that the SHASH distribution fits the original data and that the Normal distribution fits the transformed data before proceeding with computing and inverting tolerance intervals.