This addin allows you to run the generalized linear mixed model in JMP. Default parameters settings are mostly the same as the SAS %GLIMMIX macro, which uses iteratively reweighted likelihoods to fit the model (Wolfinger & O'connell, 1993).
The Select Columns panel shows all the variables in the current dataset.
The Pick Role Variables panel allows you to specify:
The Construct Model Effects panel includes:
The Options panel includes:
We will illustrate the details of the addin using examples available online (%GLIMMIX macro results website: https://support.sas.com/techsup/notes/v8/25/030.html or the PROC GLIMMIX website: https://go.documentation.sas.com/?docsetId=statug&docsetVersion=15.1&docsetTarget=statug_glimmix_exa... ). To derive comparative results using the SAS Macro, set ‘DDFM=KR’ option in the MODEL statement, also set ‘NOBOUND’ option in the PROC MIXED statement.
Example 6  Salamander data long format: https://youtu.be/6gCqufudFV4
Example 7  RNAseq data: https://youtu.be/o4LeTmVzzVE
The following is glimpse of the ship data (originally from McCullagh and Nelder (1989), Section 6.3.).
The response variable is “N”, which follows Poisson distribution. The fixed effect in the Generalized Linear Mixed Model is “type”; and the random effects are “year, period, year*period”. The offset variable is “service”.
The default link function for Poisson distribution is the ‘log’ link. Note that, if we want to use the canonical (default) links for a selected distribution, we can just leave the link box blank.
Interaction terms can be added by selecting two or more variables and click the “cross” button in the Fixed Effects or Random Effects panel.
Click “Run” button after specifying all the model parameters. After 28 iterations, the algorithm converged. The result output is of the Mixed Model format in JMP. Note that when comparing to the %GLIMMIX SAS macro results, due to the difference in design matrix coding, we suggest comparing the Ftest results, LS means or Variance component estimates instead of the fixed effects parameter estimates.
The following is glimpse of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Note that before running the analysis, define proper modeling type for variables of interest. In the example below, the variables “fnum, mnum” are continuous numbers. But if we want to use them as categorical variables, we can right click the variable name > column information, then change the modeling type to nominal.
Model specification: Response variable: ybin. Distribution: Binomial. Link function: logit.
Fixed effect: fpop, mpop, fpop*mpop. Random effect: fpop*fnum, mpop*mnum.
The algorithm converges after 10 iterations.
Data source: https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_code_gmxex04.htm&docsetVersion...
The response variable “prop” indicates the incidence. “site” and “variety” are the two fixed effects. We select the “Binomial” distribution and the default link function is “Logit”. (We specify no random effect in this example.)
The LSmeans of “variety” and “site” are consistent with that from the %GLIMMIX macro and PROC GLIMMIX. However, the Ftest results showed discrepancies.
After 6 iterations, we derive the following result. This result is consistent with the result generated by the %GLIMMIX macro, but slightly different from the result from PROC GLIMMIX.
Data source: https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_code_gmxex01.htm&docsetVersion...
The response is “y/n”, which is of the form “events/trials” or “successes/total trials”. When specifying the response variables Y, we need to select the “events” variable first, in this case “y”, and then select the “trials” variable afterwards. Hence, “n” is selected after “y”. Set “entry” as fixed effect and “block” as random effect. Choose “Binomial” distribution and use the default “Logit” link function.
After 6 iterations, we derive the results as follows:
The results of the above examples can be verified by the SAS PROC GLIMMIX and the %GLIMMIX macro. The code SAS_Example_macro_proc_verification.sas is available upon request.
When the data consists of subsets and you wish to model each group/subset independently, you can specify the group/subset indicator variable in the “By” field.
In this example, the sal1_long.jmp is the stacked version of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Here, the response variables “y1, y2, y3” are stacked into the variable “ystack”. The variable “ycat” indicates the original response category.
Here, specify “ystack” as the response variable, and “ycat” as the By variable. Similar to the settings in example 2, we can specify the distribution, fixed effect, random effect.
The GLMM procedure will be performed for each group/subset specified in the By field. The final report is as below. Results are listed for each group/subset – response y1_pseudo, response y2_pseudo, and response y3_pseudo.
Since genomics data usually is of high dimension, where the number of genes / SNPs is usually greater than 10k, performing the integrated analysis could be computationally challenging. For future interest of using this addin to analyze the RNAseq data, we test the performance when increase the number of genes which are specified in the By field. This section will serve as a part of the stress test, as the By function is not written in multithreading.
The example data used here is public online, from the package (Hoffman & Roussos, 2020): https://bioconductor.org/packages/release/bioc/vignettes/variancePartition/inst/doc/dream.html
From the example online, for each gene, we can fit the model as:
Y ~ DiseaseSubtype + Sex + (1Individual)
, where “DiseaseSubtype” and “Sex” are fixed effects and “Individual” is the random effect. Here the response variable Y
will depend on the choice of distribution. One choice of the distribution is “Poisson” distribution, and in this case we use the raw count “expression” as the response variable. An alternative way to model the RNAseq data is taking the log transformation of the counts first, and use “logexprs” as the response variable while specifying the “Normal” distribution.
Using the example data varPartDEdata_stack10.jmp, we can derive the results for each of the ten genes separately:
Using the “logexprs” as the response variable, the results from Analyze>Fit Model>Mixed Model and our GLMM addin are the same. When using the raw counts “expression” as the response and specify “Poisson” distribution, the run time is longer. If the maximum number of iterations is reached, it could imply the selected distribution or link function may not well fit for the data.
Number of subgroups (genes) in By variable 
Poisson Run time (HH:MM:SS) 
Normal Run time (HH:MM:SS) 
Mixed Model using log(count) run time (HH:MM:SS) 
10 
00:01:15 
00:00:06 

100 
00:06:00 
00:00:43 
00:00:01 
1000 
01:25:19 
00:09:05 
00:01:41 
10000 
11:11:05 
01:31:58 

…… 



When the number of genes/subgroups specified in By field is greater than By Group Report Limit=10, for example using varPartDEdata_stack100.jmp, we will no longer show the report. Instead, a “GLMM result” JMP table along with a volcano plot will be provided. The “GLMM result” table includes part of the Ftest results (DFDen_, F Ratio_, Prob > F_), the least square means (LSMeans:), as well as part of the Parameter Estimates (Estimate:, log10(Prob>t):).
By specifying “logexprs” as Y, and specifying “Normal” distribution, we can derive the dataset below. (Available as GLMM result.jmp)
The default volcano plot is generated for the first fixed effect specified. You can switch columns to look at volcano plot for other variables of interest.
(100 genes, logexprs)
(10000 genes, logexprs)
(10000 genes, expression, Poisson)
Hoffman, G. E., & Roussos, P. (2020). dream: Powerful differential expression analysis for repeated measures designs. BioRxiv, 432567.
Wolfinger, R., & O'connell, M. (1993). Generalized linear mixed models a pseudolikelihood approach. Journal of statistical Computation and Simulation, 48(34), 233243.
This addin will be of great value to JMP users.
Hi
Yes, this looks very useful. I tried the ship example and it works, but the last example with the RNAseq data gives me a cryptic error, and it also gives me a somewhat similar error with my own RNAseq data. With the varPartDEdata_stack10.jmp data set, I followed the directions (and YouTube video) with Y=logexprs, By = Gene, Normal distribution, fixed effects of Sex and DiseaseSubtype and random effect of Individual. I hit Run and it has an error: Name Unresolved: Gene at row 1 in access or evaluation of 'Gene', Gene/*###*/.
Any suggestions?
Thanks, Peter
Hi @PDunn , I see your problem and yes it did happen to me sometimes and I'm trying to fix it. With the current version, can you try to reinstall the addin and see if the RNAseq example work? Renaming the 'by' variable in the datatable sometimes also work.
Hi Peter @PDunn , thanks for trying that! (did you try not "unregister" it and directly reinstall it?) There's definitely something that needs to be fixed! The "By" function is added recently and it might need more debugging. I was using JMP 15.2 while writing the addin but I don't think the version matters. This is actually my summer internship project and it might take me a while to fix the bug and update it since now I have limited access to JMP. I'll keep you updated once I have a solution!
Hi Peter @PDunn , yes the volcano plot is a nice summary figure and the MixedModel does not directly generate that. Will update if this gets fixed! JMP Genomics has a nice RNAseq analysis workflow and you can check on that if interested. Meichen
Hi @PDunn , I've updated the addin file here. Could you please download it again and try it now? I think this fix should solve the problem. Please let me know if you have more questions! Meichen