cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
mgerusdurand
Level IV

Analyzing correlation of non normal distributed data from Sequencing analyses

Hello,

I may have not dig enough but I don't find the answer in the community.

We have NGS like data, expression of genes, that we would like to correlates between different sample zones. The data are not normally distributed. 

We have various inputs saying that we should use log transformation to get a normal distribution. 

We want to study tha correlation, can we do it on non-normal distributed data? Do we need to go for the log distribution? 

The correlation are not the same using one or the other method so we want to be sure we are doing it the right way are we are not statisticians.

Thanks to anyone that can help us on that point.

 

Have nice day

 

Marie

MGD
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Analyzing correlation of non normal distributed data from Sequencing analyses

Hi @mgerusdurand , thank you for your question!

For your data, is it sparse (a lot of zeros)? If the data is not very sparse, then generally, like RNA-seq data, we would recommend transforming the data, e.g. log-transform, and then perform correlation analysis, dimension reduction and visualization (PCA, UMAP, TSNE),  and clustering.

There are also different types of correlations -- pearson correlation, spearman's correlation, kendall's tau, etc. Depending on what your research interest is, you may choose different methods. 

 

 

View solution in original post

4 REPLIES 4

Re: Analyzing correlation of non normal distributed data from Sequencing analyses

Hi @mgerusdurand , thank you for your question!

For your data, is it sparse (a lot of zeros)? If the data is not very sparse, then generally, like RNA-seq data, we would recommend transforming the data, e.g. log-transform, and then perform correlation analysis, dimension reduction and visualization (PCA, UMAP, TSNE),  and clustering.

There are also different types of correlations -- pearson correlation, spearman's correlation, kendall's tau, etc. Depending on what your research interest is, you may choose different methods. 

 

 

mgerusdurand
Level IV

Re: Analyzing correlation of non normal distributed data from Sequencing analyses

Thanks @MeichenDong !

 

We don't have a lot of zero, these are not tandard RNA seq data because they are spatially resolved. I don't want to "hide" outliers by a log transformation so I don't get the recommended transformation. Pearson correlation on non parametric data sounds like answering our question in this particular setting.

 

Thanks for your help

MGD

Re: Analyzing correlation of non normal distributed data from Sequencing analyses

I am also having the same issue.

Re: Analyzing correlation of non normal distributed data from Sequencing analyses

Hi @CrossoverBird55 , is there any specific problem that we could help?

The way to transform / normalized data also depends on the downstream analysis plans.