Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
JMP Pro 11 has added a new modeling personality, Mixed Model, to its Fit Model platform. What’s a mixed model? How does JMP Pro fit such a model? What are the key applications where mixed models can be applied? In this and future blog posts, I will try to dispel myths about mixed models and illustrate the software’s capabilities with real-life examples.
What’s a Linear Mixed Model?
Linear mixed models are a generalization of linear regression models, y=Xβ+ε . This model is fit to a sample of cross-sectional data by standard least squares to estimate the fixed-effect parameters, β. Extending the model to allow for random effects, Z, the new regression model becomes y=Xβ+Zγ+ε. It’s called the mixed model because there are both fixed effects and random effects.
We make the following assumptions about random effect parameters, γ and random error ε : (1) γ and ε are normally distributed, and (2) there are no covariance between γ and ε. JMP provides an unstructured covariance structure for γ, and several commonly used structures for ε. Using the restricted maximum likelihood method (REML), JMP jointly estimates β as well as covariance matrices for γ and ε . In order to fit such a model, additional data on each subject is required, or, in case of modeling spatial data, dimensions of measurements are needed. (In recent years, mixed model theory has been extended to encompass such statistical methods as empirical Bayes, ridge regression, time series and smoothing splines. However, I limit the scope of my discussion to the “traditional” use of linear mixed models.)
Why Mixed Models?
When there exists correlation between responses or an important causal factor is omitted, failure to account for that leads to under- or overestimating the effects of treatment and other factors.
Here are some of common use cases for mixed models:
Allowing coefficients (e.g., intercept and slope) to vary randomly across subjects (random coefficient models). A variant is the individual growth model, which can be applied to predict individual growth trajectory and degradation data analysis.
Analysis of randomized block designs, and split-plot designs where hard-to-change and easy-to-change factors result in multiple error terms.
Controlling for unobserved individual heterogeneity in the form of random effects (panel data models).
Analysis of repeated measures where within-subject errors are correlated.
Correlated responses where different measures are taken from the same subjects.
Subjects are hierarchical (e.g., students within schools). This is known as a hierarchical linear model or multilevel model.
Spatial variability (geostatistics).
The list goes on and on. With JMP Pro 11, you can easily specify and fit all of these models using the point-and-click interface and review the results in a user-friendly way. Before I turn to my first example, let me outline the general steps for specifying your mixed model in JMP Pro.
Steps for Specifying Mixed Models
Select Analyze =>Fit Model, and choose Mixed Model Personality.
Select a continuous response variable as Y and construct fixed effects as you normally would do with a standard least squares fit.
Use the Random Effects tab to specify random coefficients or random effects.
Use the Repeated Structure tab to select a covariance structure for model errors.
Example 1: Random Coefficient Models — Allowing Coefficients to Vary Randomly Across Subjects
In this example, we’re interested in estimating the effect on wheat yield of pre-planting moisture in the soil while allowing each variety to have random deviation from population effects. So, a random coefficient model is called for. The experiment randomly selects 10 varieties from the wheat population and assigns each to six one-acre plots of land. In total, 60 observations with six measurements of yield for each variety are collected. (The data, “Wheat,” is available in JMP’s Sample Data folder.)
I followed the steps laid out above to specify my random coefficient model. From the Fixed Effects tab, I added fixed effects (i.e., population intercept and population Moisture effect).
From the Random Effects tab, I used the Nest Random Coefficients button to specify that a variety’s intercept and Moisture effect vary randomly from one to another. Note that JMP’s covariance structure for random coefficients is unstructured.
From the Repeated Structure tab, I selected Residual for the model error term.
This example is detailed in the JMP documentation. Let's examine the results. First, take a look at the Random Effects Covariance Parameter Estimates report.
The variance estimate for Intercept is 18.89 with a standard error estimate of 9.11, so the z-score is 2.07 (=18.89/9.11). Using the Normal Distribution function from JMP Formula Editor (or look up in a standard normal distribution table in any statistics text book), we can find the p-value to be 0.0192, indicating that the variation in baseline yield (i.e., without any pre-planting watering) across varieties is statistically significant. Similarly, we obtain the-p-value for Cov(Moisture, Intercept), 0.3777, and p-value forVar(Moisture), 0.0380. Although the sign on the covariance estimate is negative, there is no statistical evidence that this negative correlation is significant. The variation in Yield across different moisture levels is significant at α=0.05.
The Random Coefficients report gives the BLUP (Best Linear Unbiased Predictor) values for how each variety is different from the population intercept and population Moisture effect (reported in Fixed Effects Parameter Estimates). For Variety 1, the estimated moisture effect on its yield is 0.61 (=0.66-0.05), baseline yield is 34.39 (=33.43+0.96), and the predicted yield equation is Yield=34.39+0.61*Moisture.
Combining both the fixed effects and random coefficient estimates, we find a significant overall effect on wheat yield of moisture and discover significant variation in the moisture effect across different varieties. The random coefficient model produces a BLUP prediction equation for yield for each variety.
Other Specifications of Random Coefficient Models
Individual Growth Model is a type of random coefficient model in which random time effect is estimated for each individual. After adding a continuous time variable (e.g., day, month, etc.) as a random effect, use the Nest Random Coefficients button to request a separate slope and intercept for each individual.
In education research, subjects are often nested in a hierarchical order. By adding multiple groups of random effect statements you can fit hierarchical linear models/multilevel models.
Stay tuned. In my next blog post, I will discuss using mixed models for panel data, repeated measures and spatial regression.