/******************************************************************* Chapter 9. Example: DENTAL DATA Analysis of the dental study data by fitting a random coefficient model in time using PROC MIXED. - the repeated measurement factor is age (time) - there is one "treatment" factor, gender The model for each child is assumed to be a straight line. The intercepts and slopes may have different means depending on gender, with the same covariance matrix D for each gender. We use the RANDOM and REPEATED statements to fit models that make several different assumptions about the forms of the matrices Ri and D. *******************************************************************/ options ls=80 ps=59 nodate; run; /****************************************************************** Read in the data set (See Example 1 of Chapter 4) *******************************************************************/ data dent1; infile "dental.txt"; input obsno child age distance gender; run; /******************************************************************* Sort the data so we can do gender-by-gender fits. *******************************************************************/ proc sort data=dent1; by gender; run; proc print data=dent1; run; /******************************************************************* ******************************************************************** PROC MIXED ******************************************************************** ******************************************************************* Use PROC MIXED to fit the random coefficient model via the RANDOM statement. For all of the fits, we use usual normal ML rather than REML (the default). In all cases, we use the usual parameterization for the mean model. The SOLUTION option in the MODEL statement requests that the estimates of the regression parameters be printed. RANDOM statement: To fit a random coefficient model, we must specify that both intercept and slope are random in the RANDOM statement. REPEATED: If no REPEATED statement appears, then PROC MIXED assumes that Ri = sigma^2*I. Otherwise, we use a REPEATED statement to set a structure for Ri with the TYPE = option. *******************************************************************/ * MODEL (i): Y = X beta + Zb + Eps , Eps ~ sigma2 N(0, I) ; * Ri = diagonal with constant variance sigma^2 same in both genders; * No REPEATED statement necessary to fit this Ri (default); * D = (2x2) unstructured matrix same for both genders; * Specified in the RANDOM statement; title 'RANDOM COEFFICIENT MODEL WITH DIAGONAL WITHIN-CHILD'; title2 'COVARIANCE MATRIX WITH CONSTANT VARIANCE SAME FOR EACH GENDER'; title3 'SAME D MATRIX FOR BOTH GENDERS'; proc mixed method=ml data=dent1; class gender child; model distance = gender gender*age / noint solution; random intercept age / type=un subject=child g gcorr v vcorr; estimate 'diff in mean slope' gender 0 0 gender*age 1 -1; contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq; run; /********************************************************************** RANDOM COEFFICIENT MODEL WITH DIAGONAL WITHIN-CHILD COVARIANCE MATRIX WITH CONSTANT VARIANCE SAME FOR EACH GENDER SAME D MATRIX FOR BOTH GENDERS The Mixed Procedure Estimated G Matrix Row Effect child Col1 Col2 1 Intercept 1 4.5569 -0.1983 2 age 1 -0.1983 0.02376 # # The G and GCORR options ask that the D matrix and the corresponding correlation matrix would be printed # Estimated G Correlation Matrix Row Effect child Col1 Col2 1 Intercept 1 1.0000 -0.6025 2 age 1 -0.6025 1.0000 Estimated V Matrix for child 1 Row Col1 Col2 Col3 Col4 1 4.6216 2.8891 2.8727 2.8563 2 2.8891 4.6839 3.0464 3.1251 3 2.8727 3.0464 4.9363 3.3938 4 2.8563 3.1251 3.3938 5.3788 # # The V and VCORR options ask that the overall Sigma matrix be printed (for the first subject or particular subjects). # Estimated V Correlation Matrix for child 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.6209 0.6014 0.5729 2 0.6209 1.0000 0.6335 0.6226 3 0.6014 0.6335 1.0000 0.6586 4 0.5729 0.6226 0.6586 1.0000 # # The first 3 covariance parameters estimates reffer to the elements of D matrix, D11, D12, D22, also shown in G matrix # # The estimate of sigma2 in the assumed model Ri = sigma2 I is in the Covariance Parameter Estimates table # (along with the distinct elements o fD repeated) and is equal to 1.716 (Residual). # Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) child 4.5569 UN(2,1) child -0.1983 UN(2,2) child 0.02376 Residual 1.7162 Fit Statistics -2 Log Likelihood 427.8 AIC (smaller is better) 443.8 AICC (smaller is better) 445.3 BIC (smaller is better) 454.2 The Mixed Procedure Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 3 50.44 <.0001 Solution for Fixed Effects Standard Effect gender Estimate Error DF t Value Pr > |t| gender 0 17.3727 1.1820 54 14.70 <.0001 gender 1 16.3406 0.9801 54 16.67 <.0001 age*gender 0 0.4795 0.09980 54 4.80 <.0001 age*gender 1 0.7844 0.08275 54 9.48 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F gender 2 54 247.00 <.0001 age*gender 2 54 56.46 <.0001 # # Estimate: beta_G1 - beta_B1 # Estimates Standard Label Estimate Error DF t Value Pr > |t| diff in mean slope -0.3048 0.1296 54 -2.35 0.0224 # # H0: beta_G0 - beta_B0 =0 # beta_G1 - beta_B1 = 0 # # NOTICE: the 'comma' in the contrast statement that implies multiple hypothesis testing # Contrasts Num Den Label DF DF Chi-Square F Value Pr > ChiSq Pr > F overall gender diff 2 54 14.19 7.10 0.0008 0.0018 **********************************************************************/ * MODEL (ii): Y = X beta + Zb + Eps , Eps ~ sigma1.2 N(0, I) GIRLS; * Eps ~ sigma2.2 N(0, I) BOYS * Fit the same model but with a separate diagonal Ri matrix for; * each gender. Thus, there are 2 separate variances sigma^2_(G and B); * D still = (2x2) unstructured matrix same for both genders; * Specified in the RANDOM statement; title 'RANDOM COEFFICIENT MODEL WITH DIAGONAL WITHIN-CHILD'; title2 'COVARIANCE MATRIX WITH SEPARATE CONSTANT VARIANCE FOR EACH GENDER'; title3 'SAME D MATRIX FOR BOTH GENDERS'; proc mixed method=ml data=dent1; class child gender; model distance = gender gender*age / noint solution; repeated / group=gender subject=child; random intercept age / type=un subject=child g gcorr v vcorr; estimate 'diff in mean slope' gender 0 0 gender*age 1 -1; contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq; run; /**************************************************************** Covariance Parameter Estimates Cov Parm Subject Group Estimate UN(1,1) child 3.1978 UN(2,1) child -0.1103 UN(2,2) child 0.01976 Residual child gender 0 0.4449 Residual child gender 1 2.6294 # # Here, we assume that the within-child variance is different # depending on gender; from the table Covariance Parameter Estimates # the estimates are given 0.445 for GIRLS and 2.629 for BOYS # # These estimates are quite diŽerent. The implied matrix Sigma.i is different for # BOYS and GIRLS. Printes is the Sigma.i for GIRLS *****************************************************************/ * MODEL (iii); * Ri is AR(1) with the same variance and rho value for each gender; * Specified in the REPEATED statement; * D still = (2x2) unstructured matrix same for both genders; * Specified in the RANDOM statement; title 'RANDOM COEFFICIENT MODEL WITH AR(1) WITHIN-CHILD'; title2 'CORRELATION MATRIX WITH CONSTANT VARIANCE SAME FOR EACH GENDER'; title3 'SAME D MATRIX FOR BOTH GENDERS'; proc mixed method=ml data=dent1; class gender child ; model distance = gender gender*age / noint solution ; random intercept age / type=un subject=child g gcorr v vcorr; * MODEL (iv); * random intercept age / type=un group=gender subject=child g gcorr v vcorr; repeated / type=ar(1) subject=child rcorr; estimate 'diff in mean slope' gender 0 0 gender*age 1 -1; contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq; run; * MODEL (v) * Ri is the sum of two components, an AR(1) component for fluctuations; * and a diagonal component with variance sigma^2 common to both genders; * The LOCAL option adds the diagonal component to the AR(1) structure; * specified in the REPEATED statement; * D still = (2x2) unstructured matrix same for both genders; * Specified in the RANDOM statement; title 'RANDOM COEFFICIENT MODEL WITH AR(1) + COMMON MEAS ERROR WITHIN-CHILD'; title2 'CORRELATION MATRIX WITH CONSTANT VARIANCE SAME FOR EACH GENDER'; title3 'SAME D MATRIX FOR BOTH GENDERS'; proc mixed method=ml data=dent1; class gender child ; model distance = gender gender*age / noint solution ; random intercept age / type=un subject=child g gcorr v vcorr; repeated / type=ar(1) local subject=child rcorr; estimate 'diff in mean slope' gender 0 0 gender*age 1 -1; contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq; run; /***************************************************************** Model (v) includes two components for Ri: AR(1) + sigma2 I Estimated R Correlation Matrix for child 1 Row Col1 Col2 Col3 Col4 1 1.0000 -0.2256 0.2241 -0.2227 2 -0.2256 1.0000 -0.2256 0.2241 3 0.2241 -0.2256 1.0000 -0.2256 4 -0.2227 0.2241 -0.2256 1.0000 Estimated G Matrix Row Effect child Col1 Col2 1 Intercept 1 6.9045 -0.4333 2 age 1 -0.4333 0.04828 Estimated G Correlation Matrix Row Effect child Col1 Col2 1 Intercept 1 1.0000 -0.7505 2 age 1 -0.7505 1.0000 Estimated V Matrix for child 1 Row Col1 Col2 Col3 Col4 1 4.5375 2.6344 3.2041 2.4504 2 2.6344 4.5423 2.8323 3.5951 3 3.2041 2.8323 4.9333 3.4165 4 2.4504 3.5951 3.4165 5.7106 # # The estimates of the parameters describing the within unit fluctuations are # variance = 1.1408 (see Residual) # the autoregressive parameter = -0.9935 (see AR(1)) # # The estimate of the variance of the measurement error is: # variance = 0.3351 (see Variance ) # Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) child 6.9045 UN(2,1) child -0.4333 UN(2,2) child 0.04828 Variance child 0.3351 AR(1) child -0.9935 Residual 1.1408 *****************************************************************/ /******************************************************************* Inspection of the AIC and BIC values for these fits on pages 2, 6, 9, and 13 of the output (pages 337-349, textbook) shows that model (ii), where a different within-child variance is assumed for each gender and D the same seem preferable among the four models considered. The AIC and BIC values for this model are 424.0 and 435.7, respectively. Comparing the values to those for the general regression models considered in the analysis of these data in section 8.8 reveals that these AIC and BIC values seem comparable to those for the preferred model in that section, where Sigma.i was modeled as following a different compound symmetry structure for boys and girls. Thus, among all models considered for these data so far, either of these seems plausible. Model (ii) here may be more pleasing to many analysts, because it considers the two sources of variation explicitly. The key element seems to be allowing the within-child variance to be different for the two genders; allowing D to differ as well in model (iv) offered no improvement in fit. Inspection of the original data plot reveals the potential source of this result. Note that 2 of the boys, and one especially, have trajectories that seem to \bounce around" much more than those of the other children. From above, the estimate of variance for boys, was much larger than that for girls. Otherwise, the trajectories seem similarly spread out across girls and boys, supporting the choice of common D. Being able to model the covariance structure in terms of the two sources of variation explicitly makes this clear, allowing a pleasing interpretation of how the overall covariance structure differs. Such an interpretation is more dificult with the model of section 8.8. *****************************************************************/