Choose Language Hide Translation Bar
dagenu
Staff
Expression Analysis with JMP Genomics, Part 3: Basic Expression Workflow

JMP Genomics has analytical pipelines or Workflows to perform a series of analyses on a data set. The Basic Expression Workflow automates a series of quality control and normalization methods, followed by ANOVA, to perform a basic expression analysis. For this workflow, we will inspect data from an experiment in which MCF7 cells, cells which cause tumors, are exposed to estrogen at three different timepoints. These cells reproduce rapidly when exposed to estrogen, and expression levels are expected to rise with continued exposure. This series of analyses will give a distribution of data at each timepoint under both control and experimental conditions, compare expression levels for each, and create a standardized data set for use with further analytical tools.

  1. Open the e2 _expression_edf_data.sas7bdat Inspect the structure of the experiment. Each column represents a group (experimental/control) measurement at a specified time (12 hrs/24 hrs/ 48 hrs). Three measurements are taken for each group at each level for a total of 18 measurements taken at six different experimental conditions.1_input_data.png
  2. From the Genomics Starter menu, select Workflows > Basic > Basic Expression Workflow to bring up the dialog box.
  3. In the General tab, choose e2 _expression_edf_data.sas7bdat as the Input SAS Data Set.
  4. From the list of Available Variables, select VAR1 as the Label Variable.
  5. By Variables can be included to separate the output by different levels of a specified variable. This data set, however, does not have any of these variables. Furthermore, a Chromosome Variable and/or a Position Variable can be included if annotation information is included in the data set.
  6. Choose an Output Folder and set the Workflow Output Name as E2_Treatment for easy file identification.2_general.png
  7. In the Experimental Design tab, select e2_expression_edf.sas7bdat as the Experimental Design Data Set. This data set contains information about the design of the experiment and is required for analysis. For more information on how this data set was created see: Expression Analysis with JMP Genomics, Part 1: Experimental Design.
  8. The Color Variable will color output according to levels of the specified columns. For this option, select Characteristics.
  9. Select ColumnName as the Label Variable. This column corresponds to the column names in the Input Data Set.
  10. Select both Time and Characteristics for the Variables Defining Plotting Groups to separate plots by the different time points and treatment groups.
  11. Variance Component Effects can be included to specify columns that define sources of variability and these effects can be adjusted prior to their modeling using the Adjustment Effects. The output from these fields will be a pie chart explaining the source of variability in the data.3_ED.png
  12. On the QC and Normalization tab, select the types of QC Analysis to perform. For this example, select both Distribution Analysis and Correlation and Principal Variance Components Analysis.
  13. In the Normalization box, select STD as the Normalization Method.
    • This is a quick and easy way to standardize the expression levels for each experimental group in your data set. The ? icon next to Normalization Method gives a description of the four normalization method options.4_QCandNorm.png
  14. In the ANOVA tab, select Time & Characteristics as the Class Variables. These variables form distinct categories in the model.
  15. List the fixed effects in the Model these Fixed Effects box as “Time|Characteristics”.
    • Including the bar operator “|” models the interaction effect for Time & Characteristics as well as modeling both variables as main effects. This is identical to entering “Time Characteristics Time*Characteristics”.5_ANOVA.png
  16. Under the LSMeans tab, enter “Time|Characteristics” in the field labeled Estimate LSMeans for these Fixed Effects.
  17. For LSMeans Difference Set for Volcano Plots, select Simple Differences. This will denote differences between the LS Means effects entered in step 16, but only when the levels of those variables have changed.
    • To denote the difference between all possible pairs of levels, select All Pairwise Differences, or to take differences vs a single reference level, select Differences with a Control.
  18. Check the box next to Cluster significant LSMean profiles. This will perform hierarchical clustering for each of the specified variables from step 16 that are calculated to have a significant difference.
  19. To ensure the proper differences are calculated in the output, select Difference Chooser (highlighted in red below). This will open a new window showing the default differences that will be calculated for use in the volcano plots.6_LSMeans.png
  20. In the Difference Chooser window, notice that the default operation is to subtract the timepoint with longer E2 exposure from the earlier timepoint. We want the opposite of this. Click the Reverse All button to swap the order. 
    • Optionally, select which comparisons to include in your output by checking or unchecking the Include boxes next to each comparison.
  21. Click Save.7_diff_chooser.png
  22. In the Multiple Testing tab, select a Multiple Testing Method. For this example, we will use FDR to control the false discovery rate.
  23. Select an Alpha Here, we will use the standard 0.05.8_Multiple_Testing.png
  24. There is no annotation data set in this example, but in cases where annotation is present, it can be added in the Annotation
  25. Click Run to begin the analysis. The result is the following JMP Journal.

Results

**Interactive reports from this analysis can be found on JMP Public. 

1_journal.png

  1. Click the Distribution Analysis button to bring up the first results window.
  2. The initial tab shows the Kernel Density Estimate curves for each of the experimental groups before normalization.
    • For further data exploration, highlight any of the curves on the plot and then click the Create Subset Experimental Design Data Set, Excluding Selected Curves option to bring up a new EDF without the highlighted curves.2a_densitycurve.png
  3. The Box Plots tab displays a box plot of the response for each column prior to normalization.2b_boxplots.png
  4. Return to the Journal window and select the second set of output, Correlation and Principal Variance Component Analysis.
    • Note that each of the Correlation and Principal Variance Component Analysis result windows are identical in output.
  5. The 3D PCA Plot tab gives a 3D scatterplot of the principal components. Red points are groups that were treated with estrogen while blue points are control groups.
    • There is a clear grouping of the groups, suggesting that much of the variance in the data is explained by the estrogen treatment.3_PCA_3D.png
  6. The 2DA PCA Plots tab shows the variance for each principal component group in a matrix. The histograms on the diagonal indicate the percentage of the total variance explained by each principal component. The Scree Plot gives the amount of variance explained by each principal component.
  7. The Correlation Heat Map tab gives a heat map of the correlation matrix. There is a correlation between the treated and untreated groups seen in the increased blue coloring of the upper-right and lower-left quadrants.4_heatmap.png
  8. The Correlation Distributions tab shows the distributions of the correlation matrix and their p-values. The low p-values are indicative of robust results.5_dist.png
  9. To bring up the correlation matrix and a matrix of the correlation scatterplots, select Correlation and Grouped Scatterplots from the Launch Follow-Up Process
    • This will open a new dialog box; clicking Run will bring up the matrices.
  10. The resulting output is the correlation matrix that was used to develop the Heat Map, and Distribution A portion of it is shown below.6_scatmatrix.png
  11. Return to the Journal window again and select Distribution Analysis. This is the standardized density curve and box plot resulting from normalization with medians of each response group centered at 0.
    • The normalized data sets are now saved to the designated Output Folder and can be inspected by clicking Data Standardize from the journal window.7_std_curve.png8_std_box.png
  12. Return to the Journal window again and select the final output option, ANOVA.
  13. The initial results window that opens has a volcano plot showing the differences in expression between sets of experimental groups.9_Anova_window.png
  14. Notice the x-axis on each of the plots (representing the difference in expression) is not uniform. In order to analyze these plots effectively, the axes will need to be changed. Right-click the x-axis of the first plot and select Axis Settings.1_axis_change.png
  15. Enter a range of -4.5 to 6 in the Minimum & Maximum fields for Scale.
  16. Enter 0.5 as the Increment option and select OK.2_axischange.png
  17. Next, right-click on the same axis and select Edit > Copy Axis Settings. This saves the axis settings so they can be applied to the other plots.3_ax_change.png
  18. Scroll down to the second plot now and hold Ctrl + right-click. Select Edit > Paste Axis Settings. This will apply the copied axis setting to every plot.
  19. Once the axes have been adjusted, take a look at the results. The volcano plot below shows the difference in expression for cells that were given estrogen treatment and cells with no estrogen treatment. The x-axis represents the difference in expression and the y-axis represents the statistical significance of the difference.  The red reference line represents a p-value of 0.05.11_volcplot1.png
  20. The plots below show the difference between estrogen treated and control groups at the three different stages of exposure.12_volcplots2.png
  21. It is clear from the shape of the volcano plots that exposure to estrogen does cause a change in gene expression in MCF7 cells.
  22. Below, we compare the volcano plots of the control and experimental groups for the change in expression from 12hrs to 24hrs and from 24hrs to 48hrs. Once again there is a decrease in expression levels from the 24hr to 48hr measurements, and an increase in expression from the 12hr to 24hr timepoints. This is more defined in the groups that were exposed to estrogen.13_volc_3.png
  23. Scroll to the LSMeans Parallel Plots on the right side of the output window. The top plot shows the interaction between characteristics (control vs. experimental groups) and time. Nonparallel lines, as seen in this plot, indicate an interaction between the two variables. The bottom plot shows simple means with no interactions.
    • Note that highlighting any lines on this graph will select their corresponding points in the volcano plots and vice versa.14_LSMeans.png
  24. Further analyses can be conducted from the Drill Downs Menu, including fitting a model using the ANOVA results, creating a subset of selected data, making intensity plots, and more.15_drilldowns.png

Summary

The Basic Expression Workflow is an effective way to string together a series of analyses and compare expression levels across experimental conditions. This workflow is highly recommended for launching into expression analysis. It allows for data standardization and QC, data exploration such as distribution and correlation analyses, and expression and LSMeans analysis. From the output window, it is easy to continue analysis in the Drill Downs menu and fit a model to your data.

2 Comments
Level II

are these same analysis options available in JMP14 Pro? 

Staff

Hi @irinastl,

 

These analysis options are not available in JMP in the manner that they are presented above.  One could try to replicate the analysis above, but it would take a lot of work and time (button clicking, data table manipulation, some scripting, and other various operations) to reproduce similar results. These basic workflows take away the enormous time and effort to create these analysis within JMP and why JMP Genomics exist. Also, the shear number of genes (looking at the whole genome) is usually in the 10,000s which is likely to crash JMP due to memory limitations on a computer.

 

I hope that gives some context.

 

Best,