Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 23, 2019 6:11 PM
(1880 views)

Hi JMP community,

I need to analyse the relationship between biomarker levels (normally distributed) and a clinical end point that highly biased toward 0 with a relatively few points away from 0 (see below).

Essentially, if I use this clinical data as a continuous variable in the Fit Model platform, my residuals are not distributed normally.

I have tried some obvious transformation (e.g. cube root) but that does not solve my problem

What would be the appropriate transformation and/or method I could use to analyze this data?

Thank you for your help.

TS

Thierry R. Sornasse

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I can't tell from your description, but it looks like most of the observations have 0 as the clinical factor you are trying to explain. It it is actually 0, then it seems that most people do not have this clinical outcome at all, while relatively small numbers have positive and negative outcomes (of varying degrees). If that is the case, you might try changing the response variable to a nominal variable indicating either a negative, positive, or zero outcome. If the focus is on the size of these deviations from zero, then you can (if you have a lot of data) try a two step process: first predict the negative, 0, positive category, and then conditional on a nonzero outcome, try to predict the size of the discrepancy (if you use absolute values for the discrepancy, you could use a log transform).

In any case, some additional context about the nature of the clinical variable would be helpful for soliciting ideas.

5 REPLIES 5

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

The Distribution Platform might be able to help you figure out what distribution you have and how to transform the data. In the Distribution Platform, go to the red trangle and select

Continuous Fit==>All

JMP will determine which distribution best fits your data, and if possible, it will provide you with a transformation.

Jim

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I can't tell from your description, but it looks like most of the observations have 0 as the clinical factor you are trying to explain. It it is actually 0, then it seems that most people do not have this clinical outcome at all, while relatively small numbers have positive and negative outcomes (of varying degrees). If that is the case, you might try changing the response variable to a nominal variable indicating either a negative, positive, or zero outcome. If the focus is on the size of these deviations from zero, then you can (if you have a lot of data) try a two step process: first predict the negative, 0, positive category, and then conditional on a nonzero outcome, try to predict the size of the discrepancy (if you use absolute values for the discrepancy, you could use a log transform).

In any case, some additional context about the nature of the clinical variable would be helpful for soliciting ideas.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

Thank you for your suggestion. As you guessed correctly, the data is composed of a majority of true 0 values plus a minority of values away from 0.

Best regards,

TS

Best regards,

TS

Thierry R. Sornasse

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

I am not clear about the requirement that the data be normally distributed. The response should be normally distributed conditioned on the linear predictor. We usually assess this assumption, among others, by examining the plot of the residuals versus the predicted response. What does the residual plot show?

Also, the residual plot is a better guide towards a transformation, if necessary.

Finally, Fit Least Squares provides a command to perform the Box-Cox Transformation on the response. It is a generalized power function that usually succeeds.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Difficult Data Distribution: Appropriate Analysis Method and/or Transformation?

Dear Mark,

Thank you for your feedback. As I mentioned in my initial post, the residual produced by the Fit Least Square was clearly not normally distributed, with a distribution similar to that of the data I showed earlier.

Of note, I tried to apply the Box-Cox Transformation but it cannot be conducted as such because of 0 and negative values.

Best regards,

TS

Thank you for your feedback. As I mentioned in my initial post, the residual produced by the Fit Least Square was clearly not normally distributed, with a distribution similar to that of the data I showed earlier.

Of note, I tried to apply the Box-Cox Transformation but it cannot be conducted as such because of 0 and negative values.

Best regards,

TS

Thierry R. Sornasse

Article Labels

There are no labels assigned to this post.