/******************************************************************* CHAPTER 6, EXAMPLE 1 Analysis of the dental study data by one-way MANOVA using PROC GLM - the repeated measurement factor is age (time) - there is one "treatment" factor, gender *******************************************************************/ options ls=80 ps=59 nodate; run; /****************************************************************** See Example 1 in Chapter 4 for the form of the input data set. It is not in the correct from for the analysis; thus we create a new data set such that each record in the data set represents the observations from a different unit. *******************************************************************/ data dent1; infile "dental.txt"; input obsno child age distance gender; run; proc print data=dent1; run; data dent1; set dent1; if age=8 then age=1; if age=10 then age=2; if age=12 then age=3; if age=14 then age=4; drop obsno; run; proc sort data=dent1; by gender child; data dent2(keep=age1-age4 gender); array aa{4} age1-age4; do age=1 to 4; set dent1; by gender child; aa{age}=distance; if last.child then return; end; run; proc print data=dent2; run; /******************************************************************* ******************************************************************** PROC CORR ******************************************************************** ******************************************************************* The sample mean vectors for each gender were found in Example 1 of Chapter 4. Here, we use PROC CORR to calculate the estimates of the covariance matrix, separately for ach group. The COV option asks for the covariance matrix to be printed. *******************************************************************/ proc sort data=dent2; by gender; run; proc corr data=dent2 cov; by gender; var age1 age2 age3 age4; run; /******************************************************************* PARTIAL OUTPUT ******************************************************************* ***************************************************************** Pearson Correlation Coefficients, N = 11 Prob > |r| under H0: Rho=0 age1 age2 age3 age4 age1 1.00000 0.83009 0.86231 0.84136 0.0016 0.0006 0.0012 age2 0.83009 1.00000 0.89542 0.87942 0.0016 0.0002 0.0004 age3 0.86231 0.89542 1.00000 0.94841 0.0006 0.0002 <.0001 age4 0.84136 0.87942 0.94841 1.00000 0.0012 0.0004 <.0001 *****************************************************************/ /******************************************************************* ******************************************************************** PROC GLM [MANOVA] ******************************************************************** ******************************************************************** Use PROC GLM to carry out the multivariate analysis [DATA = 'multivariate' mode]. additional reference: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect017.htm PROC GLM with the MANOVA statement to do the 1-way MANOVA test for H0: mu1=mu2 (Because there are 2 groups, this is equivalent to Hotelling's T^2 test.) h-specifies the effects (that give the null hypothesis H0) The PRINTH and PRINTE options print the SS&CP matrices Q_H and Q_E corresponding to H0. The option NOUNI (model statement) suppresses individual ANOVA for the data at each age value from being printed. Without this option PROC GLM does a separate univariate ANOVA on the data at each age. *******************************************************************/ proc glm data=dent2; class gender; model age1 age2 age3 age4 = gender; manova h=gender / printh printe; run; /******************************************************************* PARTIAL OUTPUT ******************************************************************* The printed output from MANOVA is quite large. The first sections printed are the standard univariate ANOVA results for the response at each time (age1, age2, age3, age4) The ANOVA table gives the sum of squares (SS) explained by the model and associated statistics in the row labeled ‘Model’, the error SS in the row labeled ‘Error’, and the total SS in the row labeled ‘Corrected Total’. To find out the results of the hypothesis test for comparing groups, find the row of output labeled ‘Model’ and look at the column labeled ‘ F Value’ which gives the value of the statistic and ‘Pr>F’ for the associated p-value. A small significance probability, Pr > F, indicates that some linear function of the parameters (mu1-mu2) is significantly different from zero. Next you find some measures of fit of the linear model for the ANOVA to the data, such as ‘R-Square’ statistic. R-Square measures how much variation in the dependent variable can be accounted for by the model; can range from 0 to 1, is the ratio of the sum of squares for the model divided by the sum of squares for the corrected total. In general, the larger the value of R2, the better the model's fit. Coef Var, the coefficient of variation, describes the amount of variation in the population, is 100 times the standard deviation estimate of the dependent variable - Root MSE (Mean Square for Error) - divided by the Mean. The coefficient of variation is often a preferred measure because it is unitless. Root MSE estimates the standard deviation of the dependent variable (or equivalently, the error term) and equals the square root of the Mean Square for Error. Be careful that the conclusion of the ANOVA is only valid if the 3 assumptions made are acceptable namely: 1) the observations are independent, 2) observations for each group are a random sample from a population with a normal distribution (this can be verified by a test for normality), 3) variances for the two independent groups are equal (for the usual univariate ANOVA this can be checked within the proc GLM using the option ‘hovtest’ in comment in the MEANS statement). ***************************************************************** Dependent Variable: age1 Sum of Source DF Squares Mean Square F Value Pr > F Model 1 18.6877104 18.6877104 3.45 0.0750 Error 25 135.3863636 5.4154545 Corrected Total 26 154.0740741 R-Square Coeff Var Root MSE age1 Mean 0.121290 10.48949 2.327113 22.18519 ***************************************************************** Q_E matrix = Pooled Covariance estimate of SIGMA [measure of the WITHIN GROUP VARIATION] ----------------------------------------------------------------- The GLM Procedure Multivariate Analysis of Variance E = Error SSCP Matrix age1 age2 age3 age4 age1 135.38636364 67.920454545 97.755681818 67.755681818 age2 67.920454545 104.61931818 73.178977273 82.928977273 age3 97.755681818 73.178977273 161.39346591 103.26846591 age4 67.755681818 82.928977273 103.26846591 124.64346591 ----------------------------------------------------------------- Q_H matrix [measure of the BETWEEN GROUP VARIATION] ----------------------------------------------------------------- The GLM Procedure H = Type III SSCP Matrix for gender age1 age2 age3 age4 age1 18.687710438 17.496212121 29.003577441 37.281355219 age2 17.496212121 16.380681818 27.154356061 34.904356061 age3 29.003577441 27.154356061 45.013941498 57.861163721 age4 37.281355219 34.904356061 57.861163721 74.375052609 ----------------------------------------------------------------- The initial printout produces results similar to the PCA. The characteristic root is the square root of an eigenvalue. These are the eigenvalues of the product of the sum-of-squares matrix of the model (Q_H) and the sum-of-squares matrix of the error (Q_E). In this example there is one eigenvalue for each of the eigenvectors a 4x4 matrix. The percents listed next to the characteristic roots indicate the amount of variability in the outcomes a given root and vector account for. In this example, the first root accounts for 100% of the variability in the effect of GENDER. The rows of this table represent the characteristic vector, i.e. eigenvector, of each root. ----------------------------------------------------------------- Characteristic Characteristic Vector V'EV=1 Root Percent age1 age2 age3 age4 0.66030051 100.00 0.01032388 -0.04593889 -0.01003125 0.11841126 0.00000000 0.00 -0.07039943 0.13377597 0.00249339 -0.02943257 0.00000000 0.00 -0.08397385 -0.01167207 0.12114416 -0.04667529 0.00000000 0.00 0.05246789 0.05239507 0.05062221 -0.09027154 ----------------------------------------------------------------- The next portion of this printout shows the test results for the GENDER effect. SAS produces four test statistics: typically, it does not make any difference which one is used, however, Wilks' Lambda is a good general choice. This statistic is an F test and is interpreted in the same manner as the univariate version. In our case, we have 2 groups, hence all the 4 tests are the equal; furthermore they equal to the Hotelling's T square. Here, the common F value is 3.63 with 4 (Num DF) and 22 (Den DF) degrees of freedom. The p-value is 0.0203 which is significant relative to the typical level of significance 0.05. The remaining sections of the complete printout give similar results for each main effect and interaction. ----------------------------------------------------------------- MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=1 N=10 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.60230061 3.63 4 22 0.0203 Pillai's Trace 0.39769939 3.63 4 22 0.0203 Hotelling-Lawley Trace 0.66030051 3.63 4 22 0.0203 Roy's Greatest Root 0.66030051 3.63 4 22 0.0203 *****************************************************************/ /******************************************************************* ******************************************************************** PROC GLM [repeated] ******************************************************************** ******************************************************************* PROC GLM with REPEATED option is used to do profile analysis. The NOU option in the REPEATED statement suppresses printing of the univariate tests of these factors. The within-unit analyses using different contrast matrices will be the same as in the univariate case (see the discussion in section 4.6. Thus, we do not do this analysis here. *******************************************************************/ proc glm data=dent2; class gender; model age1 age2 age3 age4 = gender / nouni; repeated age / nou; run; /******************************************************************* PARTIAL OUTPUT ******************************************************************* The test for constancy (age effect here) is the multivariate test for assessing whether the profiles are in fact constant over time, assuming that the profiles are parallel. It may be viewed as a refinement test, carried only once the hypothesis of paralellism is NOT rejected. All the 4 multivariate tests are equal, but the common test is different from the univariate F test (based on the compound symmetry assumption). ----------------------------------------------------------------- MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no age Effect H = Type III SSCP Matrix for age E = Error SSCP Matrix S=1 M=0.5 N=10.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.19479424 31.69 3 23 <.0001 Pillai's Trace 0.80520576 31.69 3 23 <.0001 Hotelling-Lawley Trace 4.13362211 31.69 3 23 <.0001 Roy's Greatest Root 4.13362211 31.69 3 23 <.0001 ----------------------------------------------------------------- The test for paralellism (age*gender effect here) is the multivariate test for assessing the effect of the interaction between group and time. Because we have 2groups all the tests are the same; however the common test is different from the univariate F test (based on the compound symmetry assumption). ----------------------------------------------------------------- MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no age*gender Effect H = Type III SSCP Matrix for age*gender E = Error SSCP Matrix S=1 M=0.5 N=10.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.73988739 2.70 3 23 0.0696 Pillai's Trace 0.26011261 2.70 3 23 0.0696 Hotelling-Lawley Trace 0.35155702 2.70 3 23 0.0696 Roy's Greatest Root 0.35155702 2.70 3 23 0.0696 ----------------------------------------------------------------- The "between subjects" (units) test is that for coincidence assuming profiles are parallel, based on averaging across times. This is the same as the univariate F test (since this test is based on averaging the unit-level response; hence it averages away how the observations are correlated). ----------------------------------------------------------------- The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F gender 1 140.4648569 140.4648569 9.29 0.0054 Error 25 377.9147727 15.1165909 *******************************************************************/ /******************************************************************* ******************************************************************* PROC GLM [MANOVA statement, NORMALIZED HELMERT transformation] ******************************************************************* ******************************************************************* PROC GLM with MANOVA to test that a set of contrasts are equal to 0 The contratsts are specified using the Helmert transformation matrix, which is manually normalized. Recall Helmert assess the equality between the mean at the current time and the average mean at the following times. ********************************************************************/ proc glm data=dent2; model age1 age2 age3 age4 = gender /nouni; manova h=gender m=0.866025404*age1 - 0.288675135*age2- 0.288675135*age3 - 0.288675135*age4; manova h=gender m= 0.816496581*age2-0.40824829*age3-0.40824829*age4; manova h=gender m= 0.707106781*age3- 0.707106781*age4; run; /******************************************************************* PARTIAL OUTPUT ******************************************************************* We get each individual test separately for each contrasts. Recall that when the constrasts are orthogonal it makes sense to interpret the results separately. The first line specifies the weights of the contrast. The 4 tests are all equal; furthermore they are equal to the univariate F tests, obtained with PROC GLM, REPEATED statement and contrasts specified by Helmert transformation. (see the output presented in the previous class) ----------------------------------------------------------------- M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0.866025404 -0.288675135 -0.288675135 -0.288675135 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=-0.5 N=11.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.94649719 1.41 1 25 0.2457 Pillai's Trace 0.05350281 1.41 1 25 0.2457 Hotelling-Lawley Trace 0.05652717 1.41 1 25 0.2457 Roy's Greatest Root 0.05652717 1.41 1 25 0.2457 ----------------------------------------------------------------- M Matrix Describing Transformed Variables age1 age2 age3 age4 MVAR1 0 0.816496581 -0.40824829 -0.40824829 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall gender Effect on the Variables Defined by the M Matrix Transformation H = Type III SSCP Matrix for gender E = Error SSCP Matrix S=1 M=-0.5 N=11.5 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.84543853 4.57 1 25 0.0425 Pillai's Trace 0.15456147 4.57 1 25 0.0425 Hotelling-Lawley Trace 0.18281810 4.57 1 25 0.0425 Roy's Greatest Root 0.18281810 4.57 1 25 0.0425 ********************************************************************/