New single-cell RNA-sequencing workflow in JMP Genomics 10
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
New single-cell RNA-sequencing workflow in JMP Genomics 10
Sep 26, 2020 9:52 AM
| Last Modified: Sep 30, 2020 6:52 AM
The advent of scRNA-seq provides unprecedented opportunities for exploring gene expression profiles at the single-cell level. Currently, scRNA-seq has become a preferred choice for studying the key biological questions of cell heterogeneity, especially in oncology and immunology studies. However, it also poses new challenges in data visualization and statistical analysis due to its high dimensionality, sparsity, varying heterogeneity across cell populations, technical noises, and lack of reproducibility.
Next-generation sequencing-based technologies for genomics, transcriptomics, and epigenomics are now increasingly focused on the characterization of individual cells. Single-cell RNA sequencing (scRNA-seq), for example, can comprehensively characterize transcriptional changes at the cellular level and help to better understand the function of an individual cell in the context of its microenvironment. Recently, it has been used to combat COVID-19 by characterizing transcriptional changes in individual immune cells.
The newly added scRNA-seq workflow in JMP Genomics 10 helps you to visually and interactively explore complex and rare cell populations through clustering analysis and then detect the differential gene expression patterns across cell types and other conditions.
Why JMP Genomics?
In JMP Genomics 10, we created a standardized, interactive, and reproducible scRNA-seq workflow for scientists to efficiently explore their scRNA-seq data with great confidence. Key features of this workflow include data quality control (QC), variable gene selection, sparse SVD analysis, clustering, feature importance screening, dynamic visualizations (e.g., violin plot, ridgeline plot, and dot plot), as well as differential gene expression analysis. In addition, it utilizes the R integration feature in JMP to perform t-SNE or UMAP visualizations on the cell populations when appropriate R packages are installed.
This article shows you the key features in this new workflow by analyzing a PBMC data set as an example. First, you will need to use the Feature-Barcode matrices importer located in the Import>Next-Gen Sequencing menu in JMP Genomics to read in the raw UMI count data in sparse format and then convert it to dense format. Output data from the importer can then be used as the input for the basic Single-cell RNA-seq workflow found under the Workflows>Basic submenu. After specifying the settings in the interface, click Run to generate a tabbed report organized in a JMP Project.
Figure 1: Interface of the Basic ScRNA-Seq Workflow.
Data Overview and QC
The workflow performs QC steps to remove low-quality cells and genes. A data overview report will help you decide which QC criteria to use. (You can explore this visualization and others in this post at JMP Public.)
Figure 2: Data overview report with key statistics of this data set.
Variable Gene Selection
One way to deal with the high dimensionality and sparsity in scRNA-seq data is to focus solely on genes that exhibit high cell-to-cell variation. To facilitate this, the workflow offers two variable gene selection methods, Dispersion and VST, to help you select a set of highly variable genes for subsequent analyses. The Variable Gene Selection tab shows you a variance-mean plot with the selected variable genes colored red.
Figure 3: Variance-mean plot of the VST method. Red dots represent selected variable genes; the remaining genes are discarded.
Sparse SVD Analysis
The workflow then performs a sparse SVD analysis to further reduce the dimension and sparsity of the data set for clustering and visualization purposes. The clustering tab provides both a 2D and a 3D SVD plot with top components to help you explore the global structure in your data.
Figure 4: 3D SVD plot visualizing the top three components.
One important task in scRNA-seq data analysis is identifying cell populations through a clustering algorithm. This workflow provides two clustering algorithms, hierarchical and K-means. In this example, we used hierarchical clustering, which identified nine clusters in the PBMC data set. You can also explore clusters through a dendrogram or a constellation plot.
Figure 5: Dendrogram of hierarchical clustering.
Figure 6: Constellation plot of hierarchical clustering.
UMAP and t-SNE Visualization
UMAP and t-SNE are two popular visualization methods used in scRNA-seq data exploration. JMP Genomics offers an option to visualize your clustering result with t-SNE and UMAP provided you have the Rtsne and UMAP packages installed to your local R environment. JMP Genomics also offers a tool called Feature Switcher to help you interactively go through gene expression patterns in t-SNE and UMAP visualizations without generating dozens of figures to your hard drive. In Figure 7, I use UMAP as an example.
Figure 7: UMAP visualization with Local Data Filter and Feature Switcher
JMP Genomics also offers additional visualization tools allowing you to dynamically and interactively explore gene expression patterns across cell types, including violin plots, ridgeline plots, and dot plots. Users can define cell populations based on the expression pattern of a list of marker genes through a Recode button. Additionally, users can check information about genes of interest through links to external databases.
Figure 8: Violin plot, ridgeline plot, and dot plot of gene expression levels across cell types.
Feature Importance Screening
Although the variable gene selection step significantly reduced the number of genes in the analyses, there are still thousands of genes left to be explored. Therefore, JMP Genomics offers a Feature Screening method using a bootstrap forest algorithm to rank the genes based on their importance in cell separation. Users can select, for example, the top 30 genes for visualization instead of 2,000. JMP Genomics also provides a tool that helps send this list of important genes to the GTEx database to explore their tissue-specific expression patterns.
Figure 9: Feature Screening report showing the top 30 genes.
Differential Gene Expression
Differential gene expression analysis is a key step to derive statistical insights about transcriptional changes across cell types from scRNA-seq data. This workflow integrates the ANOVA platform in JMP Genomics to perform differential gene expression analysis using a sophisticated mixed-effect model that allows complicated study designs.
This five-minute introductory video about this new single-cell RNA-seq workflow will get you started.
As scRNA-seq continues to gain popularity in biomedical research, a reproducible, interactive, and user-friendly workflow is needed to quickly extract valid insights from the rapidly growing scRNA-seq data sets. The new scRNA-seq workflow in JMP Genomics 10 allows you to expedite your important research with greater convenience and confidence. Moreover, JMP Genomics 10 is more powerful than ever, taking advantage of the enhanced predictive modeling platform in JMP Pro and providing better integration with open source packages in R and Python. Go to JMP Genomics for more information.
Be sure to check out the following presentations about scRNA-seq workflow and other exciting features in JMP Genomics: