Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar

Generalized Linear Mixed Model Add-in

This add-in allows you to run the generalized linear mixed model in JMP. Default parameters settings are mostly the same as the SAS %GLIMMIX macro, which uses iteratively reweighted likelihoods to fit the model (Wolfinger & O'connell, 1993).

p0.png

The Select Columns panel shows all the variables in the current dataset.

 

The Pick Role Variables panel allows you to specify:

  • Y: the response variable.
  • Weight: a weighting variable for the analysis.
  • Offset: the offset variable. The default is no offset.
  • By: the by variable allows you to fit models for subgroups simultaneously.

The Construct Model Effects panel includes:

  • Fixed effects
  • Random effects

The Options panel includes:

  • Distribution: the response variable distribution. The default is Binomial.
  • Link: the link function to use to transform the response variable into pseudo scores before utilizing the mixed model. The canonical link functions are provided.
  • Max Iteration: the maximum number of iterations for the algorithm to converge. The default value is 30.
  • Convergence criterion: if the RMSE is less than the value of convergence criterion, then the iteration will stop. The default value is 1E-8.
  • Correction factor: a constant added to the response variable at the initial transformation step in order to avoid singularities. The default value is 0.5.
  • By Group Report Limit: when specifying a By variable, if the number of subgroups is greater than this limit number, the report output will be suppressed. Instead, a summary table and a Graph Builder interface will be provided.

We will illustrate the details of the add-in using examples available online (%GLIMMIX macro results website: https://support.sas.com/techsup/notes/v8/25/030.html or the PROC GLIMMIX website: https://go.documentation.sas.com/?docsetId=statug&docsetVersion=15.1&docsetTarget=statug_glimmix_exa... ). To derive comparative results using the SAS Macro, set ‘DDFM=KR’ option in the MODEL statement, also set ‘NOBOUND’ option in the PROC MIXED statement.

 

Demo videos are also available online:

Example 6 -- Salamander data long format: https://youtu.be/6gCqufudFV4

Example 7 -- RNA-seq data: https://youtu.be/o4LeTmVzzVE

 

Example 1 – Poisson distribution (ship_demo.jmp):

The following is glimpse of the ship data (originally from McCullagh and Nelder (1989), Section 6.3.).

The response variable is “N”, which follows Poisson distribution. The fixed effect in the Generalized Linear Mixed Model is “type”; and the random effects are “year, period, year*period”. The offset variable is “service”.

p2.png

The default link function for Poisson distribution is the ‘log’ link. Note that, if we want to use the canonical (default) links for a selected distribution, we can just leave the link box blank.

Interaction terms can be added by selecting two or more variables and click the “cross” button in the Fixed Effects or Random Effects panel.

p3.png

Click “Run” button after specifying all the model parameters. After 28 iterations, the algorithm converged. The result output is of the Mixed Model format in JMP. Note that when comparing to the %GLIMMIX SAS macro results, due to the difference in design matrix coding, we suggest comparing the F-test results, LS means or Variance component estimates instead of the fixed effects parameter estimates.

p4.png

 

Example 2 – Binary distribution (sal1.jmp):

The following is glimpse of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Note that before running the analysis, define proper modeling type for variables of interest. In the example below, the variables “fnum, mnum” are continuous numbers. But if we want to use them as categorical variables, we can right click the variable name > column information, then change the modeling type to nominal.

p5.png

Model specification: Response variable: ybin. Distribution: Binomial. Link function: logit. 

Fixed effect: fpop, mpop, fpop*mpop. Random effect: fpop*fnum, mpop*mnum.

p6.png

The algorithm converges after 10 iterations.

p7.png

 

Example 3 – Binomial distribution (blotch.jmp)

p8.png

Data source: https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_code_gmxex04.htm&docsetVersion...

 

The response variable “prop” indicates the incidence. “site” and “variety” are the two fixed effects. We select the “Binomial” distribution and the default link function is “Logit”. (We specify no random effect in this example.)

p9.png

The LSmeans of “variety” and “site” are consistent with that from the %GLIMMIX macro and PROC GLIMMIX. However, the F-test results showed discrepancies.

p10.png

 

Example 4 – Binomial distribution (rcb_binomial.jmp)

p12.png

 After 6 iterations, we derive the following result. This result is consistent with the result generated by the %GLIMMIX macro, but slightly different from the result from PROC GLIMMIX.

p13.png

 

Example 5 – Binomial distribution (hessianfly.jmp)

p14.png

Data source: https://go.documentation.sas.com/?docsetId=statug&docsetTarget=statug_code_gmxex01.htm&docsetVersion...

 

The response is “y/n”, which is of the form “events/trials” or “successes/total trials”. When specifying the response variables Y, we need to select the “events” variable first, in this case “y”, and then select the “trials” variable afterwards. Hence, “n” is selected after “y”. Set “entry” as fixed effect and “block” as random effect. Choose “Binomial” distribution and use the default “Logit” link function.

p15.png

After 6 iterations, we derive the results as follows:

p16.png

The results of the above examples can be verified by the SAS PROC GLIMMIX and the %GLIMMIX macro. The code SAS_Example_macro_proc_verification.sas is available upon request.

 

Example 6 – Binary distribution (sal1_long.jmp):

When the data consists of subsets and you wish to model each group/subset independently, you can specify the group/subset indicator variable in the “By” field.

In this example, the sal1_long.jmp is the stacked version of the Salamander data (originally from McCullagh and Nelder (1989), Section 14.5). Here, the response variables “y1, y2, y3” are stacked into the variable “ystack”. The variable “ycat” indicates the original response category.

p17.png

Here, specify “ystack” as the response variable, and “ycat” as the By variable. Similar to the settings in example 2, we can specify the distribution, fixed effect, random effect.

p18.png

The GLMM procedure will be performed for each group/subset specified in the By field. The final report is as below. Results are listed for each group/subset – response y1_pseudo, response y2_pseudo, and response y3_pseudo.

p19.png

 

Example 7 – RNA-seq data

Since genomics data usually is of high dimension, where the number of genes / SNPs is usually greater than 10k, performing the integrated analysis could be computationally challenging. For future interest of using this add-in to analyze the RNA-seq data, we test the performance when increase the number of genes which are specified in the By field. This section will serve as a part of the stress test, as the By function is not written in multi-threading.

The example data used here is public online, from the package (Hoffman & Roussos, 2020): https://bioconductor.org/packages/release/bioc/vignettes/variancePartition/inst/doc/dream.html

  • varPartDEdata_count.jmp is the original count matrix
  • varPartDEdata_metadata.jmp is the metadata for the 24 samples
  • varPartDEdata_stack10.jmp is the processed data, where gene expression from the first 10 genes are stacked, along with other metadata variables.
  • varPartDEdata_stack100.jmp is the processed data, where gene expression from the first 100 genes are stacked, along with other metadata variables.

p20.png

From the example online, for each gene, we can fit the model as:

Y ~ DiseaseSubtype + Sex + (1|Individual)

 

, where “DiseaseSubtype” and “Sex” are fixed effects and “Individual” is the random effect. Here the response variable Y

will depend on the choice of distribution. One choice of the distribution is “Poisson” distribution, and in this case we use the raw count “expression” as the response variable. An alternative way to model the RNA-seq data is taking the log transformation of the counts first, and use “logexprs” as the response variable while specifying the “Normal” distribution.

 

p21.png


p22.png

 

Using the example data varPartDEdata_stack10.jmp, we can derive the results for each of the ten genes separately:

p23.png

 

Using the “logexprs” as the response variable, the results from Analyze>Fit Model>Mixed Model and our GLMM add-in are the same. When using the raw counts “expression” as the response and specify “Poisson” distribution, the run time is longer. If the maximum number of iterations is reached, it could imply the selected distribution or link function may not well fit for the data.

 

Number of subgroups (genes) in By variable

Poisson Run time (HH:MM:SS)

Normal Run time (HH:MM:SS)

Mixed Model using log(count) run time (HH:MM:SS)

10

00:01:15

00:00:06

 

100

00:06:00

00:00:43

00:00:01

1000

01:25:19

00:09:05

00:01:41

10000

11:11:05

01:31:58

 

……

 

 

 

 

When the number of genes/subgroups specified in By field is greater than By Group Report Limit=10, for example using varPartDEdata_stack100.jmp, we will no longer show the report. Instead, a “GLMM result” JMP table along with a volcano plot will be provided. The “GLMM result” table includes part of the F-test results (DFDen_, F Ratio_, Prob > F_), the least square means (LSMeans:), as well as part of the Parameter Estimates (Estimate:, -log10(Prob>|t|):).

By specifying “logexprs” as Y, and specifying “Normal” distribution, we can derive the dataset below. (Available as GLMM result.jmp)

 

p24.png

 

The default volcano plot is generated for the first fixed effect specified. You can switch columns to look at volcano plot for other variables of interest.

 

p25.png

(100 genes, logexprs)

 

p26.png

(10000 genes, logexprs)

 

p27.png

(10000 genes, expression, Poisson)

 

References

Hoffman, G. E., & Roussos, P. (2020). dream: Powerful differential expression analysis for repeated measures designs. BioRxiv, 432567.

Wolfinger, R., & O'connell, M. (1993). Generalized linear mixed models a pseudo-likelihood approach. Journal of statistical Computation and Simulation, 48(3-4), 233-243.

Comments

This add-in will be of great value to JMP users.

Hi

Yes, this looks very useful.  I tried the ship example and it works, but the last example with the RNA-seq data gives me a cryptic error, and it also gives me a somewhat similar error with my own RNA-seq data.  With the varPartDEdata_stack10.jmp data set, I followed the directions (and YouTube video) with Y=logexprs, By = Gene, Normal distribution, fixed effects of Sex and DiseaseSubtype and random effect of Individual.  I hit Run and it has an error:  Name Unresolved: Gene at row 1 in access or evaluation of 'Gene', Gene/*###*/.

Any suggestions?

Thanks, Peter

Hi @PDunn , I see your problem and yes it did happen to me sometimes and I'm trying to fix it. With the current version, can you try to re-install the add-in and see if the RNA-seq example work? Renaming the 'by' variable in the datatable sometimes also work. 

Hi
Thanks for getting back to me. I just "unregistered" the add-in, then re-installed it. Then I tried the add in again on the "varPartDEdata_stack10" file. The addin did not work even after I renamed the "by" variable to Gene1 (originally Gene) or copied it to a new column and renamed it. I also tried restarting the program (JMP 16EA) and it still had the problem.
So I'm not sure what could be wrong. The error says: Name Unresolved: Gene1 at row1 in access or evaluation of 'Gene1', Gene1/*###*/
If I leave out the "by" variable (Gene1), it will run though, but it is obviously valuable to know what genes are significant here.
Thanks, Peter

Hi Peter @PDunn , thanks for trying that! (did you try not "unregister" it and directly re-install it?) There's definitely something that needs to be fixed! The "By" function is added recently and it might need more debugging. I was using JMP 15.2 while writing the add-in but I don't think the version matters. This is actually my summer internship project and it might take me a while to fix the bug and update it since now I have limited access to JMP. I'll keep you updated once I have a solution! 

Hi
Thanks for getting back to me again. Yes, I tried re-installing it and I still had the same error. I also tried it on JMP 14 on a mac. I was using Win10 before. Still the same issue. But after thinking about this more, and looking at the output (the online file GLMM result.jmp), I think I can still use this without the “By” option. For surveying a lot of genes at once, I actually want the per gene significance results which I do not get with the “by” command. On the other hand, you probably need the “by” command to get the volcano plots. At least for normally distributed response variables, I can keep using MixedModel. It would be nice to have this running though. I wonder if they have this running in the JMP Genomics package (I’ve never looked at it).
Thanks for your help again, and have a good rest of your summer. Peter

Hi Peter @PDunn , yes the volcano plot is a nice summary figure and the MixedModel does not directly generate that. Will update if this gets fixed! JMP Genomics has a nice RNA-seq analysis workflow and you can check on that if interested. -Meichen

Hi @PDunn , I've updated the add-in file here. Could you please download it again and try it now? I think this fix should solve the problem. Please let me know if you have more questions! -Meichen

Hi Meichen
Thanks very much. It works great! Hopefully that was not too much extra work to fix.
Peter