Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
Thierry_S
Level VI

Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

Hi JMP community,

 

I need to analyse the relationship between biomarker levels (normally distributed) and a clinical end point that highly biased toward 0  with a relatively few points away from 0 (see below).

DATA Distribution.png

 

Essentially, if I use this clinical data as a continuous variable in the Fit Model platform, my residuals are not distributed normally. 

I have tried some obvious transformation (e.g. cube root) but that does not solve my problem

What would be the appropriate transformation and/or method I could use to analyze this data?

 

Thank you for your help.

TS

Thierry R. Sornasse
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
dale_lehman
Level VI

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

I can't tell from your description, but it looks like most of the observations have 0 as the clinical factor you are trying to explain.  It it is actually 0, then it seems that most people do not have this clinical outcome at all, while relatively small numbers have positive and negative outcomes (of varying degrees).  If that is the case, you might try changing the response variable to a nominal variable indicating either a negative, positive, or zero outcome.  If the focus is on the size of these deviations from zero, then you can (if you have a lot of data) try a two step process:  first predict the negative, 0, positive category, and then conditional on a nonzero outcome, try to predict the size of the discrepancy (if you use absolute values for the discrepancy, you could use a log transform).

In any case, some additional context about the nature of the clinical variable would be helpful for soliciting ideas.

View solution in original post

5 REPLIES 5
Highlighted
txnelson
Super User

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

The Distribution Platform might be able to help you figure out what distribution you have and how to transform the data.  In the Distribution Platform, go to the red trangle and select

     Continuous Fit==>All

JMP will determine which distribution best fits your data, and if possible, it will provide you with a transformation.

     

Jim
Highlighted
dale_lehman
Level VI

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

I can't tell from your description, but it looks like most of the observations have 0 as the clinical factor you are trying to explain.  It it is actually 0, then it seems that most people do not have this clinical outcome at all, while relatively small numbers have positive and negative outcomes (of varying degrees).  If that is the case, you might try changing the response variable to a nominal variable indicating either a negative, positive, or zero outcome.  If the focus is on the size of these deviations from zero, then you can (if you have a lot of data) try a two step process:  first predict the negative, 0, positive category, and then conditional on a nonzero outcome, try to predict the size of the discrepancy (if you use absolute values for the discrepancy, you could use a log transform).

In any case, some additional context about the nature of the clinical variable would be helpful for soliciting ideas.

View solution in original post

Highlighted
Thierry_S
Level VI

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

Thank you for your suggestion. As you guessed correctly, the data is composed of a majority of true 0 values plus a minority of values away from 0.
Best regards,
TS
Thierry R. Sornasse
Highlighted

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

I am not clear about the requirement that the data be normally distributed. The response should be normally distributed conditioned on the linear predictor. We usually assess this assumption, among others, by examining the plot of the residuals versus the predicted response.  What does the residual plot show?

 

Also, the residual plot is a better guide towards a transformation, if necessary.

 

Finally, Fit Least Squares provides a command to perform the Box-Cox Transformation on the response. It is a generalized power function that usually succeeds.

Learn it once, use it forever!
Highlighted
Thierry_S
Level VI

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

Dear Mark,
Thank you for your feedback. As I mentioned in my initial post, the residual produced by the Fit Least Square was clearly not normally distributed, with a distribution similar to that of the data I showed earlier.
Of note, I tried to apply the Box-Cox Transformation but it cannot be conducted as such because of 0 and negative values.

Best regards,
TS
Thierry R. Sornasse
Article Labels

    There are no labels assigned to this post.