lab06, homework
 

Homework

Class variables


Questions /*----homework 6 -------------------------------------------- | The exercise uses the cancer data given in Exercise | | 9.13, page 320, in the text. The data are on file in the | | class account under the file 'cancer.dta'. Specifically | | we have /ncsu/st708_info/www/dickey/cancer.dta. | | (Note: page xi of the text introduction gives a web | | address for all data sets in the book). | | The order of the variables in the data set are TYPE of | | cancer, SEX of patient, AGE of patient, DAYS survival | | time, and CONT = mean number of days survival for a | | control group of 10 similar patients. | | The first two variables, TYPE and SEX are class variables. | | All treated patients were givensupplemental ascorbate, | | vitamin C. The point is to study the effect of vitamin C | | on survival time (compared to controls) and whether this | | effect depends on cancer type, sex, or age. | | | The analyses will use the logarithm of the ratio of DAYS | | to CONT, LRATIO = LOG(DAYS/CONT). Therefore, the initial | | data step creates this new variable. Also, AGE is | | redefined as deviation from the mean age, 64.3191, in the | | data step. | | Followup note | -------------------------------------------------------------*/ options pagesize=50; data cancer; infile '/ncsu/st708_info/www/dickey/cancer.dta'; *************************************************************** infile allows you to access data from an ASCII file. If the data were on drive a, you could say infile 'A:cancer.dta' The data are at the end of this file, in case you cannot access the data as shown here from your location. *************************************************************** ; input type $ sex $ age days cont; lratio = log(days/cont); age = age - 64.3191; proc print data=cancer; sum age; run; /*------------------------------------------------------------------------- | 1. LRATIO will be used as the response variable. Regard TYPE of | | cancer and SEX of patient as two factors in a 3X2 factorial with | | differing numbers of patients per treatment combination. That is, | | we will treat the basic analysis as a 3X2 factorial in a completely | | random design. The unequal numbers of patients makes these data | | unbalanced so that PROC GLM must be used and LSMEANS must be used for | | any comparisons. We will talk later in more detail about the analysis| | of unbalanced data. | | Do the appropriate factorial analysis of variance of LRATIO. Two | | kinds of hypotheses are of interest: 1. Does supplemental vitamin C | | increase the survival time of patients (i.e., is mean LRATIO > 0?) | | and 2) is LRATIO affected by type of cancer or SEX of patient? | | Here "affected" means through main effect or interaction. | | Do appropriate tests of significance to test these hypotheses. | | NOTE that the first hypothesis requires a test that the mean is | | zero, not that the treatments have no affect. For this purpose, one | | can use an ESTIMATE statement but care must be taken to make the | | sure the quantity specified is estimatble. For example, the | | statement: ESTIMATE 'MEAN' INTERCEPT 1; specifies a non-estimable | | function of the parameters; i.e., the mean in an effects model is | | not estimable. Since your model includes 3 TYPES, 2 SEXES, and 6 | | TYPEXSEX interaction effects, the mean of a balanced data set will be | | estimating mu + averages of all effects. Thus, the estimate statement| | must be: | | ESTIMATE 'MEAN' INTERCEPT 1 TYPE .33333 .33333 .33333 SEX .5 .5 | | SEX*TYPE .166667 .166667 .166667 .166667 .166667 .166667; | | but since this may involve roundoff error you are better off | | specifying | | ESTIMATE 'MEAN' INTERCEPT 6 TYPE 2 2 2 SEX 3 3 | | SEX*TYPE 1 1 1 1 1 1/divisor=6; | | Is the estimate the same as just the average of all the data? | --------------------------------------------------------------------------*/ proc glm data= cancer; class ... model ... estimate ... /*------------------------------------------------------------------------- | 2. Display the X matrix for your analysis of variance above. SAS | | allows you to do this as shown below (same CLASS, MODEL statements | | as GLM) Go left to right through this X matrix, circling the column | | numbers for linear dependencies as they are encountered (the | | interesting thing here is the relationship to degrees of freedom) For| | example, if sex is the first factor in your model then column 1 is | | the intercept and column 3 is the last sex dummy variable - the one | | that you'd circle. | -------------------------------------------------------------------------*/ proc glmmod; class ... model ... /*------------------------------------------------------------------------- | 3. Investigate the use of AGE of patient as a covariate in the anal- | | aysis. First, run a covariance anaysis that will test for homo- | | geneity of the AGE regression slope across (cancer type, sex) | | combinations. State your conclusions from this test. | -------------------------------------------------------------------------*/ proc glm data= cancer; class ... model ... /*------------------------------------------------------------------------- | 4. Regardless of the result of the test of homogeneity of regressions, | | complete the analysis of covariance assuming a common regression. | | Obtain adjusted treatment means for cancer type, sex, and all 6 | | cancer_type*sex combinations. To what age are the means adjusted? | | Was the covariate useful in increasing the precision of the analysis?| | Interpret the results and compare your conclusions with those from | | the simple analysis of variance. | --------------------------------------------------------------------------*/ proc glm data= cancer; class ... model ... /*-------------------------------------------------------------------------- | 5. Plot LRATIO against age using the plot symbol M for males and F for | | females. Make another such plot in which the type of cancer (S, B, | | C) as a plot symbol. | ---------------------------------------------------------------------------*/ /*------------------------------------------------------------------------- | OPTIONAL | | The variable LRATIO can be thought of as log(DAYS) - log(CONT). If | | log(CONT) is moved to the right of the equality in a model, | | it is evident that analysis of LRATIO is equivalent to analysis of | | Y=log(DAYS) with log(CONT) being used as a covariate BUT with the | | regression coefficient being forced to be 1.0. Use LDAYS=log(DAYS) | | as the response variable and repeat question (3) above but include | | both AGE and L(CONT)=log(CONT) as covariates. (This will require | | another data step to create the two variables LDAYS and LCONT.) | | Test the null hypothesis that the regression coefficient for LCONT | | is equal to 1.0. Do the results of this analysis suggest that both | | covariates are useful? If not which one would you drop and why? | | Obtain the adjusted treatment means and interpret the results for | | your final analysis. | --------------------------------------------------------------------------*/ data cancer2; set cancer; LDAYS = log(DAYS); LCONT = log(CONT); proc glm data=cancer2; class ... model ... /* -- Here are the data in case the INFILE statement does not work at your location: cards; S F 61 124 38 S M 69 12 18 S F 62 19 36 S F 66 45 12 S M 63 257 64 S M 79 23 20 S M 76 128 13 S M 54 46 51 S M 62 90 10 S F 69 876 19 S M 46 123 52 S M 57 310 28 S F 59 359 55 B M 74 74 33 B M 74 423 18 B M 66 16 20 B M 52 450 58 B F 48 87 13 B F 64 115 49 B M 70 50 38 B M 77 50 24 B M 71 113 18 B M 70 857 18 B M 39 38 34 B M 70 156 20 B M 70 27 27 B M 55 218 32 B M 74 138 27 B M 69 39 39 B M 73 231 65 C F 76 135 18 C F 58 50 30 C M 49 189 65 C M 69 1267 17 C F 70 155 57 C F 68 534 16 C M 50 502 25 C F 74 126 21 C M 66 90 17 C F 76 365 42 C F 56 911 40 C M 65 743 14 C F 74 366 28 C M 58 156 31 C F 60 99 28 C M 77 20 33 C M 38 274 80 ************************************** */