Aug 6, 2020 3:20 PM
| Last Modified: Aug 26, 2020 2:50 PM
Generalized Linear Mixed Model.jmpaddin
This add-in allows you to run the generalized linear mixed model in JMP. Default parameters settings are mostly the same as the SAS %GLIMMIX macro, which uses iteratively reweighted likelihoods to fit the model (Wolfinger & O'connell, 1993).
The Select Columns panel shows all the variables in the current dataset.
The Pick Role Variables panel allows you to specify:
Y: the response variable.
Weight: a weighting variable for the analysis.
Offset: the offset variable. The default is no offset.
By: the by variable allows you to fit models for subgroups simultaneously.
The Construct Model Effects panel includes:
The Options panel includes:
Distribution: the response variable distribution. The default is Binomial.
Link: the link function to use to transform the response variable into pseudo scores before utilizing the mixed model. The canonical link functions are provided.
Max Iteration: the maximum number of iterations for the algorithm to converge. The default value is 30.
Convergence criterion: if the RMSE is less than the value of convergence criterion, then the iteration will stop. The default value is 1E-8.
Correction factor: a constant added to the response variable at the initial transformation step in order to avoid singularities. The default value is 0.5.
By Group Report Limit: when specifying a By variable, if the number of subgroups is greater than this limit number, the report output will be suppressed. Instead, a summary table and a Graph Builder interface will be provided.
The following is glimpse of the ship data (originally from McCullagh and Nelder (1989), Section 6.3.).
The response variable is “N”, which follows Poisson distribution. The fixed effect in the Generalized Linear Mixed Model is “type”; and the random effects are “year, period, year*period”. The offset variable is “service”.
The default link function for Poisson distribution is the ‘log’ link. Note that, if we want to use the canonical (default) links for a selected distribution, we can just leave the link box blank.
Interaction terms can be added by selecting two or more variables and click the “cross” button in the Fixed Effects or Random Effects panel.
Click “Run” button after specifying all the model parameters. After 28 iterations, the algorithm converged. The result output is of the Mixed Model format in JMP. Note that when comparing to the %GLIMMIX SAS macro results, due to the difference in design matrix coding, we suggest comparing the F-test results, LS means or Variance component estimates instead of the fixed effects parameter estimates.
Example 2 – Binary distribution (sal1.jmp):
The following is glimpse of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Note that before running the analysis, define proper modeling type for variables of interest. In the example below, the variables “fnum, mnum” are continuous numbers. But if we want to use them as categorical variables, we can right click the variable name > column information, then change the modeling type to nominal.
Model specification: Response variable: ybin. Distribution: Binomial. Link function: logit.
Fixed effect: fpop, mpop, fpop*mpop. Random effect: fpop*fnum, mpop*mnum.
The response variable “prop” indicates the incidence. “site” and “variety” are the two fixed effects. We select the “Binomial” distribution and the default link function is “Logit”. (We specify no random effect in this example.)
The LSmeans of “variety” and “site” are consistent with that from the %GLIMMIX macro and PROC GLIMMIX. However, the F-test results showed discrepancies.
Example 4 – Binomial distribution (rcb_binomial.jmp)
After 6 iterations, we derive the following result. This result is consistent with the result generated by the %GLIMMIX macro, but slightly different from the result from PROC GLIMMIX.
Example 5 – Binomial distribution (hessianfly.jmp)
The response is “y/n”, which is of the form “events/trials” or “successes/total trials”. When specifying the response variables Y, we need to select the “events” variable first, in this case “y”, and then select the “trials” variable afterwards. Hence, “n” is selected after “y”. Set “entry” as fixed effect and “block” as random effect. Choose “Binomial” distribution and use the default “Logit” link function.
After 6 iterations, we derive the results as follows:
The results of the above examples can be verified by the SAS PROC GLIMMIX and the %GLIMMIX macro. The code SAS_Example_macro_proc_verification.sas is available upon request.
Example 6 – Binary distribution (sal1_long.jmp):
When the data consists of subsets and you wish to model each group/subset independently, you can specify the group/subset indicator variable in the “By” field.
In this example, the sal1_long.jmp is the stacked version of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Here, the response variables “y1, y2, y3” are stacked into the variable “ystack”. The variable “ycat” indicates the original response category.
Here, specify “ystack” as the response variable, and “ycat” as the By variable. Similar to the settings in example 2, we can specify the distribution, fixed effect, random effect.
The GLMM procedure will be performed for each group/subset specified in the By field. The final report is as below. Results are listed for each group/subset – response y1_pseudo, response y2_pseudo, and response y3_pseudo.
Example 7 – RNA-seq data
Since genomics data usually is of high dimension, where the number of genes / SNPs is usually greater than 10k, performing the integrated analysis could be computationally challenging. For future interest of using this add-in to analyze the RNA-seq data, we test the performance when increase the number of genes which are specified in the By field. This section will serve as a part of the stress test, as the By function is not written in multi-threading.
varPartDEdata_count.jmp is the original count matrix
varPartDEdata_metadata.jmp is the metadata for the 24 samples
varPartDEdata_stack10.jmp is the processed data, where gene expression from the first 10 genes are stacked, along with other metadata variables.
varPartDEdata_stack100.jmp is the processed data, where gene expression from the first 100 genes are stacked, along with other metadata variables.
From the example online, for each gene, we can fit the model as:
Y ~ DiseaseSubtype + Sex + (1|Individual)
, where “DiseaseSubtype” and “Sex” are fixed effects and “Individual” is the random effect. Here the response variable Y
will depend on the choice of distribution. One choice of the distribution is “Poisson” distribution, and in this case we use the raw count “expression” as the response variable. An alternative way to model the RNA-seq data is taking the log transformation of the counts first, and use “logexprs” as the response variable while specifying the “Normal” distribution.
Using the example data varPartDEdata_stack10.jmp, we can derive the results for each of the ten genes separately:
Using the “logexprs” as the response variable, the results from Analyze>Fit Model>Mixed Model and our GLMM add-in are the same. When using the raw counts “expression” as the response and specify “Poisson” distribution, the run time is longer. If the maximum number of iterations is reached, it could imply the selected distribution or link function may not well fit for the data.
Number of subgroups (genes) in By variable
Poisson Run time (HH:MM:SS)
Normal Run time (HH:MM:SS)
Mixed Model using log(count) run time (HH:MM:SS)
When the number of genes/subgroups specified in By field is greater than By Group Report Limit=10, for example using varPartDEdata_stack100.jmp, we will no longer show the report. Instead, a “GLMM result” JMP table along with a volcano plot will be provided. The “GLMM result” table includes part of the F-test results (DFDen_, F Ratio_, Prob > F_), the least square means (LSMeans:), as well as part of the Parameter Estimates (Estimate:, -log10(Prob>|t|):).
By specifying “logexprs” as Y, and specifying “Normal” distribution, we can derive the dataset below. (Available as GLMM result.jmp)
The default volcano plot is generated for the first fixed effect specified. You can switch columns to look at volcano plot for other variables of interest.
(100 genes, logexprs)
(10000 genes, logexprs)
(10000 genes, expression, Poisson)
Hoffman, G. E., & Roussos, P. (2020). dream: Powerful differential expression analysis for repeated measures designs. BioRxiv, 432567.
Wolfinger, R., & O'connell, M. (1993). Generalized linear mixed models a pseudo-likelihood approach. Journal of statistical Computation and Simulation, 48(3-4), 233-243.