In an earlier blog post, I introduced the new Mixed Model capability in JMP Pro 11 and showed an example of random coefficient models. In this post, I continue my discussion of using mixed models for repeated measures and panel data. I’ll leave modeling geospatial data as well as tips and tricks for a future post.
In the analysis of repeated measures, multiple measurements of a response are collected from the same subjects over time. This example is taken from JMP documentation. In the study, subjects were randomly assigned to different treatment groups. Each subject’s total cholesterol level was measured several times during the clinical trial. The objective of the study is to test whether new drugs are effective at lowering cholesterol. What makes the analysis distinct is the correlation of the measurements within a subject. Failure to account for it often leads to incorrect conclusions about the treatment effect. (The data, Cholesterol Stacked, is available in the JMP software’s Sample Data Directory. See my earlier post for more information.)
JMP Pro offers three commonly used covariance structures:
I follow the steps outlined in my previous post to specify the mixed model for the analysis of repeated measures. The Fixed Effects part of the model includes Treatment, Month, AM/PM and their interactions.
Fixed effects part of the model
I will consider different covariance structures for the within-subject errors. First, let’s consider Unstructured. Apply the Time column as Repeated and the Patient column as Subject — this defines the repeated measurements within a subject. It is important to note that JMP requires that the Subject column be uniquely valued and that the Repeated column be categorical for the Unstructured option.
I now focus my discussion on the Repeated Effects Covariance Parameter Estimates.
One way of testing the statistical significance of the covariance estimates is to calculate the z-scores and find their p-values, as I did in my example of a random coefficient model. However, we can check the confidence limits: If the 95% confidence interval for a covariance estimate includes zero, then we can say that it is not statistically significant from zero at α=5%. After sorting the report, we can see that all six variance estimates are significantly different from zero but most of covariance estimates are not. This suggests that a parsimonious structure, such as AR (1), should be considered.The Fixed Effects report shows a highly significant treatment effect. Cholesterol level is also found to vary significantly from month to month and from morning to afternoon.
Next, we consider AR(1). The Repeated column used in AR(1) must be a continuous variable. So, Days — the number of days from the trial start date at each measurement — is used.
The Repeated Effects Covariance Parameter Estimate report shows that the within-subject correlation is 0.95 and statistically significant. Fixed effects results are similar (not shown) — treatment effect and time effects are statistically significant.
To complete our example, let’s fit the model with a CS structure. To do so, select Residual as the Repeated Covariance Structure — but there's no need to specify Repeated and Subject columns with this option; instead, we add Patient as a random subject effect on the Random Effects tab. That is, within-subject covariance is modeled through the random subject effect. For more details on the implementation of compound symmetry structure in JMP, refer to JMP documentation.
Based on the 95% confidence limits, the covariance between any two measures on the same subject is not statistically significant at α=0.05 (actually, p-value= 0.0621). Fixed effect test results are similar to the previous models and are not shown here.
So, which repeated structure should be adopted? We can compare AICc from the Fit Statistics reports (not shown): Unstructured—703.84, AR(1)—652.63 and CS—832.55. So, AR(1) is the winner.
This example is taken from Vella and Verbeek (1998), which is discussed in Introductory Econometrics by Jeffrey Woodridge as Example 14.4. See references below for more information and where to get the data.
The original data came from the National Longitudinal Survey of Youth 1979 Cohort (NLSY79). In this analysis, each of the 545 male workers worked every year from 1980 through 1987. We’re interested in estimating the effect on wage earnings of union membership controlling for education, work experience, ethnicity, etc. Although NLSY79 collects detailed background information on the workers that can be used as control variables, there is still individual difference that cannot be observed or measured. Panel data provides a way of accounting for individual heterogeneity: If it can be assumed to be uncorrelated with all the explanatory variables, we can treat heterogeneity as a random effect.
I follow Woodridge’s discussion in his book to specify a wage equation using panel data.
I apply Residual structure to the model error term. The model is called one-way random effect model in econometrics, also known as a variance component model. The results are shown below.
From the Random Effects Covariance Parameter Estimates report, we find that individual heterogeneity accounts for 47.8% (=0.11/(0.11+0.12)) of the total variation. This indicates a large unobserved effect, suggesting an OLS analysis would likely yield misleading results.
The Fixed Effects Parameter Estimates report shows an estimated rate of return to education at 9.2% and a union premium of 10.5%, both of which are highly statistically significant.
Francis Vella and Marno Verbeek (1998), "Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Men," Journal of Applied Econometrics, Vol. 13, No. 2, pp. 163-183. (Data can be downloaded from the Journal’s website.)
Jeffrey Woodridge (2012), Introductory Econometrics: A Modern Approach, CENGAGE Learning.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.