It’s World Statistics Day! To honor the theme of the day, the JMP User Community is having conversations about the importance of trust in statistics and data. And we want to hear from you! Tell us the steps you take to ensure that your data is trustworthy.
Apr 29, 2014 10:04 AM
| Last Modified: Sep 28, 2017 12:29 PM
In an earlier blog post, I introduced the new Mixed Model capability in JMP Pro 11 and showed an example of random coefficient models. In this post, I continue my discussion of using mixed models for repeated measures and panel data. I’ll leave modeling geospatial data as well as tips and tricks for a future post.
Example 2: Analysis of Repeated Measures — accounting for correlated errors
In the analysis of repeated measures, multiple measurements of a response are collected from the same subjects over time. This example is taken from JMP documentation. In the study, subjects were randomly assigned to different treatment groups. Each subject’s total cholesterol level was measured several times during the clinical trial. The objective of the study is to test whether new drugs are effective at lowering cholesterol. What makes the analysis distinct is the correlation of the measurements within a subject. Failure to account for it often leads to incorrect conclusions about the treatment effect. (The data, Cholesterol Stacked, is available in the JMP software’s Sample Data Directory.See my earlier post for more information.)
JMP Pro offers three commonly used covariance structures:
Unstructured provides a flexible structure that estimates covariance for all pairs of measurement times. In this example of six repeated measures, 15 covariance parameters as well as six variance estimates will be estimated. This structure is most lenient but not without risk of overfitting.
AR(1) (first-order autoregressive) estimates correlation between two measurements that are one unit of time apart. The correlation declines as the time difference increases. This is a parsimonious structure with only two variance parameters to be estimated.
CS (compound symmetry) postulates that the covariance is constant regardless of how far apart the measurements are. The number of parameters to be estimated is two.
I follow the steps outlined in my previous post to specify the mixed model for the analysis of repeated measures. The Fixed Effects part of the model includes Treatment, Month, AM/PM and their interactions.
Fixed effects part of the model
I will consider different covariance structures for the within-subject errors. First, let’s consider Unstructured. Apply the Timecolumnas Repeated and the Patientcolumnas Subject — this defines the repeated measurements within a subject. It is important to note that JMP requires that the Subject column be uniquely valued and that the Repeated column be categorical for the Unstructured option.
Unstructured Covariance Structure
I now focus my discussion on the Repeated Effects Covariance Parameter Estimates.
Results using Unstructured
One way of testing the statistical significance of the covariance estimates is to calculate the z-scores and find their p-values, as I did in my example of a random coefficient model. However, we can check the confidence limits: If the 95% confidence interval for a covariance estimate includes zero, then we can say that it is not statistically significant from zero at α=5%. After sorting the report, we can see that all six variance estimates are significantly different from zero but most of covariance estimates are not. This suggests that a parsimonious structure, such as AR (1), should be considered.The Fixed Effects report shows a highly significant treatment effect. Cholesterol level is also found to vary significantly from month to month and from morning to afternoon.
Next, we consider AR(1). The Repeated column used in AR(1) must be a continuous variable. So, Days — the number of days from the trial start date at each measurement — is used.
AR(1) Covariance Structure
The Repeated EffectsCovariance Parameter Estimate report shows that the within-subject correlation is 0.95 and statistically significant. Fixed effects results are similar (not shown) — treatment effect and time effects are statistically significant.
Results using AR(1)
To complete our example, let’s fit the model with a CS structure. To do so, select Residual as the Repeated Covariance Structure — but there's no need to specify Repeated and Subject columns with thisoption; instead, we add Patient as a random subject effect on the Random Effects tab. That is, within-subject covariance is modeled through the random subject effect. For more details on the implementation of compound symmetry structure in JMP, refer to JMP documentation.
CS Structure with random subject effect and residual error
CS Structure with random subject effect and residual error
Based on the 95% confidence limits, the covariance between any two measures on the same subject is not statistically significant at α=0.05 (actually, p-value= 0.0621). Fixed effect test results are similar to the previous models and are not shown here.
Results using CS
So, which repeated structure should be adopted? We can compare AICc from the Fit Statistics reports (not shown): Unstructured—703.84, AR(1)—652.63 and CS—832.55. So, AR(1) is the winner.
Example 3: Panel Data Models — controlling for unobserved heterogeneity
This example is taken from Vella and Verbeek (1998), which is discussed in Introductory Econometrics by Jeffrey Woodridge as Example 14.4. See references below for more information and where to get the data.
The original data came from the National Longitudinal Survey of Youth 1979 Cohort (NLSY79). In this analysis, each of the 545 male workers worked every year from 1980 through 1987. We’re interested in estimating the effect on wage earnings of union membership controlling for education, work experience, ethnicity, etc. Although NLSY79 collects detailed background information on the workers that can be used as control variables, there is still individual difference that cannot be observed or measured. Panel data provides a way of accounting for individual heterogeneity: If it can be assumed to be uncorrelated with all the explanatory variables, we can treat heterogeneity as a random effect.
I follow Woodridge’s discussion in his book to specify a wage equation using panel data.
Fixed effects part of the Log(Wage) Equation
Random effects part of the Log(Wage) Equation
I apply Residual structure to the model error term. The model is called one-way random effect model in econometrics, also known as a variance component model. The results are shown below.
One-way Random Effect Model Results
From the Random Effects Covariance Parameter Estimates report, we find that individual heterogeneity accounts for 47.8% (=0.11/(0.11+0.12)) of the total variation. This indicates a large unobserved effect, suggesting an OLS analysis would likely yield misleading results.
The Fixed Effects Parameter Estimates report shows an estimated rate of return to education at 9.2% and a union premium of 10.5%, both of which are highly statistically significant.
Francis Vella and Marno Verbeek (1998), "Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Men," Journal of Applied Econometrics, Vol. 13, No. 2, pp. 163-183. (Data can be downloaded from the Journal’s website.)
Jeffrey Woodridge (2012), Introductory Econometrics: A Modern Approach, CENGAGE Learning.