/******************************************************************* CHAPTER 5, EXAMPLE 1 Analysis of the dental study data by repeated measures analysis of variance using PROC GLM - the repeated measurement factor is age (time) - there is one "treatment" factor, gender *******************************************************************/ options ls=80 ps=59 nodate; run; data dent1; infile "dental.txt"; input obsno child age distance gender; run; proc print data=dent1; run; /****************************************************************** The data set looks like 1 1 8 21 0 2 1 10 20 0 3 1 12 21.5 0 4 1 14 23 0 5 2 8 21 0 ... column 1 observation number column 2 child id number column 3 age column 4 response (distance) column 5 gender indicator (0=girl, 1=boy) The second data step changes the ages from 8, 10, 12, 14 to 1, 2, 3, 4 so that SAS can count them when it creates a different data set later *******************************************************************/ data dent1; set dent1; if age=8 then age=1; if age=10 then age=2; if age=12 then age=3; if age=14 then age=4; drop obsno; run; /******************************************************************* TRANSFORM DATA IN APPROPRIATE FORMAT [with the data record for each child on a single line.] *******************************************************************/ proc print data=dent1; run; /******************************************************************* The '4' in aa{4} below is related to having 4 repeated measurements per subject *******************************************************************/ data dent2(keep=age1-age4 gender child); array aa{4} age1-age4; do age=1 to 4; set dent1; by gender child; aa{age}=distance; if last.child then return; end; run; proc print data=dent2; run; title "TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL"; proc print data=dent2(obs=5); run; /************************************************ Obs age1 age2 age3 age4 child gender 1 21.0 20.0 21.5 23.0 1 0 2 21.0 21.5 24.0 25.5 2 0 3 20.5 24.0 24.5 26.0 3 0 4 23.5 24.5 25.0 26.5 4 0 5 21.5 23.0 22.5 23.5 5 0 ************************************************/ /******************************************************************* ******************************************************************** PROC MEANS / PLOT ******************************************************************** *******************************************************************/ /******************************************************************* Find the means of each gender-age combination and plot mean vs. age for each gender *******************************************************************/ proc sort data=dent1; by gender age; run; proc means data=dent1; by gender age; var distance; output out=mdent mean=mdist; run; proc plot data=mdent; plot mdist*age=gender; run; /******************************************************************* ******************************************************************** PROC GLM [SPLIT PLOT specification] ******************************************************************** *******************************************************************/ /******************************************************************* Construct the analysis of variance using PROC GLM via a "split plot" specification. This requires that the data be represented in the form they are given in data set dent1. Note that the F ratio that PROC GLM prints out automatically for the gender effect (averaged across age) will use the MSE in the denominator. This is not the correct F ratio for testing this effect. The RANDOM statement asks SAS to compute the expected mean squares for each source of variation. The TEST option asks SAS to compute the test for the gender effect (averaged across age), treating the child(gender) effect as random, giving the correct F ratio. Other F-ratios are correct. In older versions of SAS that do not recognize this option, this test could be obtained by removing the TEST option from the RANDOM statement and adding the statement test h=gender e = child(gender); to the call to PROC GLM. *******************************************************************/ proc glm data=dent1; class age gender child; model distance = gender child(gender) age age*gender; random child(gender) / test; run; /******************************************************************* Class Levels Values age 4 1 2 3 4 gender 2 0 1 child 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Number of Observations Read 108 Number of Observations Used 108 Dependent Variable: distance Sum of Source DF Squares Mean Square F Value Pr > F Model 32 769.5642887 24.0488840 12.18 <.0001 Error 75 148.1278409 1.9750379 Corrected Total 107 917.6921296 R-Square Coeff Var Root MSE distance Mean 0.838587 5.850026 1.405360 24.02315 Source DF Type I SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 71.12 <.0001 child(gender) 25 377.9147727 15.1165909 7.65 <.0001 age 3 237.1921296 79.0640432 40.03 <.0001 age*gender 3 13.9925295 4.6641765 2.36 0.0781 Source DF Type III SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 71.12 <.0001 child(gender) 25 377.9147727 15.1165909 7.65 <.0001 age 3 209.4369739 69.8123246 35.35 <.0001 age*gender 3 13.9925295 4.6641765 2.36 0.0781 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 83 The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: distance Source DF Type III SS Mean Square F Value Pr > F * gender 1 140.464857 140.464857 9.29 0.0054 Error 25 377.914773 15.116591 Error: MS(child(gender)) * This test assumes one or more other fixed effects are zero. Source DF Type III SS Mean Square F Value Pr > F child(gender) 25 377.914773 15.116591 7.65 <.0001 * age 3 209.436974 69.812325 35.35 <.0001 age*gender 3 13.992529 4.664176 2.36 0.0781 Error: MS(Error) 75 148.127841 1.975038 * This test assumes one or more other fixed effects are zero. *******************************************************************/ /******************************************************************* ******************************************************************** PROC GLM [REPEATED statement, PROFILE TRANSFORMATION] ******************************************************************** *******************************************************************/ /******************************************************************* Now carry out the same analysis using the REPEATED statement in PROC GLM. This requires that the data be represented in the form of data set dent2. The option NOUNI suppresses individual analyses of variance for the data at each age value from being printed. The PRINTE option asks for the test of sphericity to be performed. The NOM option means "no multivariate," which means just do the univariate repeated measures analysis under the assumption that the exchangable (compound symmetry) model is correct. *******************************************************************/ proc glm data=dent2; class gender; model age1 age2 age3 age4 = gender / nouni; repeated age / printe nom; run; /******************************************************************* Level of age 1 2 3 4 Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| DF = 25 age1 age2 age3 age4 age1 1.000000 0.570699 0.661320 0.521583 0.0023 0.0002 0.0063 age2 0.570699 1.000000 0.563167 0.726216 0.0023 0.0027 <.0001 age3 0.661320 0.563167 1.000000 0.728098 0.0002 0.0027 <.0001 age4 0.521583 0.726216 0.728098 1.000000 0.0063 <.0001 <.0001 E = Error SSCP Matrix age_N represents the contrast between the nth level of age and the last age_1 age_2 age_3 age_1 124.518 41.879 51.375 age_2 41.879 63.405 11.625 age_3 51.375 11.625 79.500 Partial Correlation Coefficients from the Error SSCP Matrix of the Variables Defined by the Specified Transformation / Prob > |r| DF = 25 age_1 age_2 age_3 age_1 1.000000 0.471326 0.516359 0.0151 0.0069 age_2 0.471326 1.000000 0.163738 0.0151 0.4241 age_3 0.516359 0.163738 1.000000 0.0069 0.4241 The GLM Procedure Repeated Measures Analysis of Variance Sphericity Tests Mauchly's Variables DF Criterion Chi-Square Pr > ChiSq Transformed Variates 5 0.4998695 16.449181 0.0057 Orthogonal Components 5 0.7353334 7.2929515 0.1997 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 15 The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 9.29 0.0054 Error 25 377.9147727 15.1165909 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DF Type III SS Mean Square F Value Pr > F age 3 209.4369739 69.8123246 35.35 <.0001 age*gender 3 13.9925295 4.6641765 2.36 0.0781 Error(age) 75 148.1278409 1.9750379 Adj Pr > F Source G - G H - F age <.0001 <.0001 age*gender 0.0878 0.0781 Error(age) Greenhouse-Geisser Epsilon 0.8672 Huynh-Feldt Epsilon 1.0156 *******************************************************************/ /******************************************************************* ******************************************************************* PROC GLM [REPEATED statement, POLYNOMIAL transformation] ******************************************************************* *******************************************************************/ /******************************************************************* This call to PROC GLM redoes the basic analysis of the last. However, in the REPEATED statement, a different contrast of the parameters is specified, the POLYNOMIAL transformation. The levels of "age" are equally spaced, and the values are specified. The transformation produced is orthogonal polynomials for polynomial trends (linear, quadratic, cubic). The SUMMARY option asks that PROC GLM print out the results of tests corresponding to the contrasts in each column of the U matrix. The NOU option asks that printing of the univariate analysis of variance be suppressed (we already did it in the previous PROC GLM call). THE PRINTM option prints out the U matrix corresponding to the orthogonal polynomial contrasts. SAS calls this matrix M, and actuallly prints out its transponse (our U’). For the orthogonal polynomial transformation, SAS uses the normalized version of the U matrix. Thus, the SSs from the individual ANOVAs for each column will add up to the Gender by Age interaction SS (and similarly for the within-unit error SS). *******************************************************************/ proc glm data=dent2; class gender; model age1 age2 age3 age4 = gender / nouni; repeated age 4 (8 10 12 14) polynomial /summary nou nom printm; run; /******************************************************************* Repeated Measures Level Information Dependent Variable age1 age2 age3 age4 Level of age 8 10 12 14 age_N represents the nth degree polynomial contrast for age M Matrix Describing Transformed Variables age1 age2 age3 age4 age_1 -.6708203932 -.2236067977 0.2236067977 0.6708203932 age_2 0.5000000000 -.5000000000 -.5000000000 0.5000000000 age_3 -.2236067977 0.6708203932 -.6708203932 0.2236067977 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 19 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 19 The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 9.29 0.0054 Error 25 377.9147727 15.1165909 The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables age_N represents the nth degree polynomial contrast for age Contrast Variable: age_1 Source DF Type III SS Mean Square F Value Pr > F Mean 1 208.2660038 208.2660038 88.00 <.0001 gender 1 12.1141519 12.1141519 5.12 0.0326 Error 25 59.1673295 2.3666932 Contrast Variable: age_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 0.95880682 0.95880682 0.92 0.3465 gender 1 1.19954756 1.19954756 1.15 0.2935 Error 25 26.04119318 1.04164773 Contrast Variable: age_3 Source DF Type III SS Mean Square F Value Pr > F Mean 1 0.21216330 0.21216330 0.08 0.7739 gender 1 0.67882997 0.67882997 0.27 0.6081 Error 25 62.91931818 2.51677273 *******************************************************************/ /******************************************************************* ******************************************************************* PROC GLM [REPEATED statement, Helmert transformation] ******************************************************************* *******************************************************************/ /******************************************************************* For comparison, we do the same analysis as above, but use the Helmert matrix instead. SAS does NOT use the normalized version of the Helmert transformation matrix. Thus, the SSs from the individual ANOVAs for each column will NOT add up to the Gender by Age interaction SS (similarly for within-unit error). However, the F ratios are correct. ********************************************************************/ proc glm data=dent2; class gender; model age1 age2 age3 age4 = gender / nouni; repeated age 4 (8 10 12 14) helmert /summary nou nom printm; run; /******************************************************************** Repeated Measures Level Information Dependent Variable age1 age2 age3 age4 Level of age 8 10 12 14 age_N represents the contrast between the nth level of age and the mean of subsequent levels M Matrix Describing Transformed Variables age1 age2 age3 age4 age_1 1.000000000 -0.333333333 -0.333333333 -0.333333333 age_2 0.000000000 1.000000000 -0.500000000 -0.500000000 age_3 0.000000000 0.000000000 1.000000000 -1.000000000 The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 9.29 0.0054 Error 25 377.9147727 15.1165909 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 24 The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables age_N represents the contrast between the nth level of age and the mean of subsequent levels Contrast Variable: age_1 Source DF Type III SS Mean Square F Value Pr > F Mean 1 146.8395997 146.8395997 45.43 <.0001 gender 1 4.5679948 4.5679948 1.41 0.2457 Error 25 80.8106061 3.2324242 Contrast Variable: age_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 111.9886890 111.9886890 39.07 <.0001 gender 1 13.0998001 13.0998001 4.57 0.0425 Error 25 71.6548295 2.8661932 Contrast Variable: age_3 Source DF Type III SS Mean Square F Value Pr > F Mean 1 49.29629630 49.29629630 15.50 0.0006 gender 1 3.66666667 3.66666667 1.15 0.2932 Error 25 79.50000000 3.18000000 ********************************************************************/ /******************************************************************* ******************************************************************* PROC GLM [MANOVA statement, NORMALIZED HELMERT transformation] ******************************************************************* *******************************************************************/ /******************************************************************* Here, we manually perform the same analysis, but using the NORMALIZED version of the Helmert transformation matrix. We get each individual test separately using the PROC GLM MANOVA statement. ********************************************************************/ proc glm data=dent2; model age1 age2 age3 age4 = gender /nouni; manova h=gender m=0.866025404*age1 - 0.288675135*age2- 0.288675135*age3 - 0.288675135*age4; manova h=gender m= 0.816496581*age2-0.40824829*age3-0.40824829*age4; manova h=gender m= 0.707106781*age3- 0.707106781*age4; run; /******************************************************************** TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 26 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0.866025404 -0.288675135 -0.288675135 -0.288675135 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 27 The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for gender E = Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector V'EV=1 Root Percent MVAR1 0.05652717 100.00 0.12845032 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=-0.5 N=11.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.94649719 1.41 1 25 0.2457 Pillai's Trace 0.05350281 1.41 1 25 0.2457 Hotelling-Lawley Trace 0.05652717 1.41 1 25 0.2457 Roy's Greatest Root 0.05652717 1.41 1 25 0.2457 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 28 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0 0.816496581 -0.40824829 -0.40824829 The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for gender E = Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector V'EV=1 Root Percent MVAR1 0.18281810 100.00 0.14468480 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=-0.5 N=11.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.84543853 4.57 1 25 0.0425 Pillai's Trace 0.15456147 4.57 1 25 0.0425 Hotelling-Lawley Trace 0.18281810 4.57 1 25 0.0425 Roy's Greatest Root 0.18281810 4.57 1 25 0.0425 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 30 The GLM Procedure Multivariate Analysis of Variance M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0 0 0.707106781 -0.707106781 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 31 The GLM Procedure The GLM Procedure Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for gender E = Error SSCP Matrix Variables have been transformed by the M Matrix Characteristic Characteristic Vector V'EV=1 Root Percent MVAR1 0.04612159 100.00 0.15861032 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=-0.5 N=11.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.95591182 1.15 1 25 0.2932 Pillai's Trace 0.04408818 1.15 1 25 0.2932 Hotelling-Lawley Trace 0.04612159 1.15 1 25 0.2932 Roy's Greatest Root 0.04612159 1.15 1 25 0.2932 ********************************************************************/ /******************************************************************* ******************************************************************* PROC GLM [separate ANOVA tests for the contrasts] ******************************************************************* *******************************************************************/ /******************************************************************* To compare, we apply the contrasts (normalized version) to each child’s data. We thus get a single value for each child corresponding to each contrast. These are in the variables AGE1P -- AGE3P. We then use PROC GLM to perform each separate ANOVA. It may be verified that the separate gender sums of squares add up to the interaction SS in the analysis above. ********************************************************************/ data dent3; set dent2; age1p = sqrt(0.75)*(age1-age2/3-age3/3-age4/3); age2p = sqrt(2/3)*(age2-age3/2-age4/2); age3p = sqrt(1/2)*(age3-age4); run; proc glm; class gender; model age1p age2p age3p = gender; run; /******************************************************************* The GLM Procedure Dependent Variable: age1p Sum of Source DF Squares Mean Square F Value Pr > F Model 1 3.42599607 3.42599607 1.41 0.2457 Error 25 60.60795455 2.42431818 Corrected Total 26 64.03395062 R-Square Coeff Var Root MSE age1p Mean 0.053503 -73.36496 1.557022 -2.122297 Source DF Type I SS Mean Square F Value Pr > F gender 1 3.42599607 3.42599607 1.41 0.2457 Source DF Type III SS Mean Square F Value Pr > F gender 1 3.42599607 3.42599607 1.41 0.2457 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 34 The GLM Procedure Dependent Variable: age2p Sum of Source DF Squares Mean Square F Value Pr > F Model 1 8.73320006 8.73320006 4.57 0.0425 Error 25 47.76988636 1.91079545 Corrected Total 26 56.50308642 R-Square Coeff Var Root MSE age2p Mean 0.154561 -76.82446 1.382315 -1.799317 Source DF Type I SS Mean Square F Value Pr > F gender 1 8.73320006 8.73320006 4.57 0.0425 Source DF Type III SS Mean Square F Value Pr > F gender 1 8.73320006 8.73320006 4.57 0.0425 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 35 The GLM Procedure Dependent Variable: age3p Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1.83333333 1.83333333 1.15 0.2932 Error 25 39.75000000 1.5900000 TRANSFORMED DATA -- 1 RECORD/INDIVIDUAL 35 The GLM Procedure Dependent Variable: age3p Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1.83333333 1.83333333 1.15 0.2932 Error 25 39.75000000 1.59000000 Corrected Total 26 41.58333333 R-Square Coeff Var Root MSE age3p Mean 0.044088 -123.4561 1.260952 -1.021376 Source DF Type I SS Mean Square F Value Pr > F gender 1 1.83333333 1.83333333 1.15 0.2932 Source DF Type III SS Mean Square F Value Pr > F gender 1 1.83333333 1.83333333 1.15 0.2932 *******************************************************************/