Nice analyses @abmayfield--you have that JMP table tricked out! With only 12 observations and binary presence/absence for the 769 proteins, it’s important not to overfit or overinterpret, but here are several more ideas:
- The following steps should produce a WGCNA-style analysis:
1. Transpose, delete the four poor-quality samples
2. Hierarchical Clustering, two-way to get heatmap like you have already done
3. Choose the number of clusters interactively (coloring can help) and Save Cluster as a new column
4. Summarize to get the mean of all variables by Cluster, which is an average protein profile for each cluster, aka eigenprotein. When running Summary, specify “column” as “statistics column name format” to facilitate the next step.
5. Transpose back and merge with experimental factors
6. Create numerical versions of the experimental factors
7. Multivariate
- More principled and powerful is to fit ANOVA models with all three factors at once. Need to be careful with limited degrees of freedom. You can do this with the eigenproteins and Fit Model. You can also do it with the original proteins but the large number of reports can be unwieldy. The main trick in this case is to right click on any report table > Make Combined Data Table.
- JMP Genomics has a more comprehensive workflow and interactive dashboard output from its Row-by-Row Modeling menu. You can even do mixed models (e.g. make genotype a random effect). JMP Genomics has numerous other helpful routines, as it is designed for high-throughput data sets like this one.
- Use significant proteins of interest and/or the preceding eigenproteins in Structural Equation Modeling or the Partial Correlation Diagram add-in to infer potential causal relationships
- If you have pathway annotations for the proteins, compute pathway-based scores and then analyze them versus the experimental factors.
- Try the add-in from MJ Guan for low-dimensional projections based on t-SNE and UMAP.
- For PLS I think you would first need to create binary indicator variables for all experimental factors with Cols > Utilities > Make Indicator Columns
- Run Analyze > Screening > Response Screening to fit all Y by X combos and select proteins based on FDR-adjusted p-values. I tried this and nothing is statistically significant, but there is a dozen or so proteins with small raw p-values.
- If you can obtain continuous measures of protein expression instead of presence/absence the preceding analyses should be more informative.