When asked if I miss being the lead developer of mixed model software at SAS, I usually reply “yes and no.” The “yes” comes from feeling very fortunate to have been involved with this powerful methodology throughout my nearly 20 years at SAS and a part of the strong legacy that began with the work of Jim Goodnight on PROC GLM and PROC VARCOMP in the early days of the company. A big debt of gratitude goes to Dave Delong, who initially pointed me in this direction, and to John Sall and Randy Tobias (our newest ASA Fellow!), who provided key pieces of code that formed the beginnings of PROC MIXED. Interestingly, John’s code was based on the pioneering work of Jennrich and Schluchter on covariance-structure modeling (the R side of the mixed model), while Randy’s code focused on generalizing REML estimation of variance components (the G side of the mixed model). Putting the two together was somewhat serendipitous and turned out to provide a wonderful technology for such diverse applications as animal breeding, clinical trials, manufacturing quality control, education assessment, space-time modeling and statistical genetics.

The “no” comes from confidence in the very impressive work done by our current mixed model gurus: Oliver Schabenberger (PROC GLIMMIX), Chris Gotwalt (JMP), Paul Wright (EVAAS) and Tianlin Wang (PROC HPMIXED). That, and my now decade-long passion for genomics and the true joy that comes from leading a top-notch team and working with all of the dedicated professionals who have helped JMP Genomics become successful.

I’ve been amazed at how mixed model theory is a unifying theme throughout statistics, encompassing such methods as empirical Bayes, ridge regression, time series, kriging, and smoothing splines. Did you know you can fit a support vector machine using the radial basis functions in PROC GLIMMIX? In a bit of a throwback to SAS in the '70s, we’re using variance components on principal components to quantitatively compare sources of variability in microarray data. Most recently, we’ve been exploring how mixed models can effectively adjust for population stratification in genome-wide association studies. So I can’t seem to get away from mixed models — not that I really want to!

Let me close with a mixed model curiosity that I was looking at just last week. Consider the following SAS code, which compares a simple linear regression model with the same model fit with random effects:

/* simulate data for a simple linear regression and compare fixed and random models */

%let seed = 2817340;

data xy;

do slope = 1 to 5;

do nx = 1 to 10;

x1 = rannor(&seed);

x2 = rannor(&seed);

do rep = 1 to 10;

y = x1*slope + x2*slope*2 + rannor(&seed);

output;

end;

end;

end;

run;

ods exclude all;

ods noresults;

proc mixed data=xy method=ml;

by slope;

model y = x1 x2 / s;

ods output fitstatistics=fs covparms=cp solutionf=sf;

run;

ods exclude none;

ods results;

title "Fit Statistics from Fixed Model";

proc sort data=fs;

by descr slope;

run;

proc print data=fs;

run;

title "Covariance Parameters from Fixed Model";

proc print data=cp;

run;

title "Beta Hat from Fixed Model";

proc sort data=sf;

by Effect Slope;

run;

proc print data=sf;

run;

ods exclude all;

ods noresults;

proc mixed data=xy method=ml;

by slope;

model y = / s;

random x1 x2 / s;

ods output fitstatistics=fsr covparms=cpr solutionf=sfr solutionr=srr ;

run;

ods exclude none;

ods results;

title "Fit Statistics from Random Model";

proc sort data=fsr;

by descr slope;

run;

proc print data=fsr;

run;

title "Covariance Parameters from Random Model";

proc sort data=cpr;

by CovParm Slope;

run;

proc print data=cpr;

run;

title "Beta Hat from Random Model";

proc print data=sfr;

run;

title "Gamma Hat from Random Model";

proc sort data=srr;

by effect slope;

run;

proc print data=srr;

run;

The second PROC MIXED call does something that I would have (up until last week!) recommended against — using simple covariates in a RANDOM statement with no SUBJECT= effect. It turns out that the gamma-hat estimates (the empirical BLUPs) are virtually identical with the beta-hat estimates from the first simple regression model. So covariance structure modeling is intimately connected with mean modeling.

If you’re really into mixed models (or would like to be), here’s a question for you: Why are the variance component estimates from the second model approximately equal to the square of the slopes? Hint: The answer is not “yes and no.”