cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
sornasst
Level III

Multivariate analysis for non-normal distribution (not transformable)?

Dear JMP community,

 

I need to analyze the effects of treatment and multiple biomarkers on an clinical outcome variable that is not distributed normally (see histogram below).

 

Distribution_Y1.png

Based on my limited knowledge, I'm pretty sure that I cannot use the Standard Least Square model for the analysis of this data since the error generated by the model is not distributed normally. So, what would be my best option to achieve my goal?

 

Note: I'm using the standard version of JMP 11.1

 

Thank you for your help.

 

Sincerely,

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Multivariate analysis for non-normal distribution (not transformable)?

Something you can try is Neural Nets under the Analyze Menu.  Neural Nets are a good platform for trying to fit non-linear data.  I am not sure if it is a standalone platform in JMP 11 or if you will find it under Predictive Modeling.  You can try several levels of the TanH function to see if you get a good fit.  One thing to remember is that NN models are easily overfit if you don't use some sort of cross-validation.  Use K-fold or Holdback and try different levels.  If you have a small data set use Leave-One-Out validation.  If you don't see that listed you can use K-Fold and make the number of rows your Number of Folds.

 

HTH 

View solution in original post

Peter_Bartell
Level VIII

Re: Multivariate analysis for non-normal distribution (not transformable)?

Have you thought about using the Partition Platform? You've certainly got splits in your response variable distribution just screaming at you. Biomarker predictor variable data is also often notoriously full of multi collinearity...so a technique such as PLS might show some insight?

View solution in original post

7 REPLIES 7
dale_lehman
Level VII

Re: Multivariate analysis for non-normal distribution (not transformable)?

I think it is hard to respond generally to your question (at least it's hard for me) without more context.  But looking at this distribution of your clinical outcome makes me think you might want to consider this as a discrete outcome.  The distribution is strongly bimodal - might it be possible to group all values < .5 into the 0 bin and >.5 into the 1 bin and then analyze it as a classificaiton problem?  Of course, if your sample size is large enough, linear models might provide a reasonable estimate for the mean - but with a bimodal outcome like this, the mean is not particularly of interest.

sornasst
Level III

Re: Multivariate analysis for non-normal distribution (not transformable)?

Hi Dale,

 

Thank you for your suggestion: I had already experimented with the bining the clinical outcome variable into 3 levels which seems to perform well with the Ordinal Logistic model.

 

Best,

 

 

Re: Multivariate analysis for non-normal distribution (not transformable)?

Something you can try is Neural Nets under the Analyze Menu.  Neural Nets are a good platform for trying to fit non-linear data.  I am not sure if it is a standalone platform in JMP 11 or if you will find it under Predictive Modeling.  You can try several levels of the TanH function to see if you get a good fit.  One thing to remember is that NN models are easily overfit if you don't use some sort of cross-validation.  Use K-fold or Holdback and try different levels.  If you have a small data set use Leave-One-Out validation.  If you don't see that listed you can use K-Fold and make the number of rows your Number of Folds.

 

HTH 

Peter_Bartell
Level VIII

Re: Multivariate analysis for non-normal distribution (not transformable)?

Have you thought about using the Partition Platform? You've certainly got splits in your response variable distribution just screaming at you. Biomarker predictor variable data is also often notoriously full of multi collinearity...so a technique such as PLS might show some insight?

sornasst
Level III

Re: Multivariate analysis for non-normal distribution (not transformable)?

Hi Peter,

Thanks for suggesting the Partition Platform. Is there a way to capture the significance of the parameters used to split the response?

Thank you for your help.

Sincerely,

Peter_Bartell
Level VIII

Re: Multivariate analysis for non-normal distribution (not transformable)?

@sornasst:

 

I would definitely look at the basic response plot, Column Contributions, and Log Worth values that populate many of the different analysis platform report windows. And I also heartily endorse using all the cross validation techniques that my colleague @Bill_Worley mentions as well. You can just as easily overfit a regression tree as you can other modeling methods. Here's a link to the general online JMP documentation for the Partition Platform. Since you are using JMP 11, this documentation is for version 13...so some of the options or menus may be different or nonexistent...but it can at least get you started.

 

http://www.jmp.com/support/help/13-2/Partition_Models.shtml#

 

Kevin_Anderson
Level VI

Re: Multivariate analysis for non-normal distribution (not transformable)?

...and, in addition to the modeling methods mentioned above, I want to call your attention to the Assess Variable Importance (a.k.a. the Sobol'izer)  pulldown in the Prediction Profiler.

 

image.png

 

Neural nets, in particular, are notorious for being "black boxes" that don't provide information about variable importance, and the Sobol'izer remedies that issue pretty righteously.  A reference is attached.