I have large genomics database on two groups of patients (continuous variables). One group developed complication at some point of observation, this group has 4 repeated measures of gene eapression collected. Control group has only 3 time points collected. I would like to know if expression of some genes at baseline (week 0) predict outcomes: comlication (coded yes no) or death (coded 1 or 2). How do I prepare the data for analysis and what model do I choose? Do I need to compare LSmeans per subgroup? In this data I have genes on columns, patients IDs on rows.
You can start with logistic regression. The data should consist of a cross-section of patients with complication and death columns (coded either as 1/0 or Y/N and modeling type set as nominal), a treatment/control group indicator (coded as 1/0), and columns for gene expression at baseline and other time points, as well as any other variables that you have available such as patient demographics.
You can then fit a nominal logistic regression on Complication (and Death) being a nominal outcome with these columns as model effects:Treatemnt/Control, Gene Expressions at different time points, and other variables. You might want to consider interactions such as Treatment/Comtrol*Gene Expression at Baseline, etc.
JMP Documentation on odds ratios from logistic regression
If you have a large genomics database, I am assuming that you may have 10,000's of genes and thus 10,000's of columns. If so, you may want to take a look at JMP Genomics:
Also, you can look at a similar use case by Matthew J Wongchenko, et.al. where they looked at different treatment groups and Progression Free Survival. Link to paper below.
If you have not used JMP Genomics before, then take a look at this short video:
Let me know if you would like to know more.
In JMP you can conduct multi-level logistic regression
but I would start with a two-level with aggregated data (and would look what will happen):
1. Complications Yes ( Complications Early and Late)
2. Complications No