lab06, homework
Homework
Class variables
Questions
/*----homework 6 --------------------------------------------
| The exercise uses the cancer data given in Exercise |
| 9.13, page 320, in the text. The data are on file in the |
| class account under the file 'cancer.dta'. Specifically |
| we have /ncsu/st708_info/www/dickey/cancer.dta. |
| (Note: page xi of the text introduction gives a web |
| address for all data sets in the book). |
| The order of the variables in the data set are TYPE of |
| cancer, SEX of patient, AGE of patient, DAYS survival |
| time, and CONT = mean number of days survival for a |
| control group of 10 similar patients. |
| The first two variables, TYPE and SEX are class variables. |
| All treated patients were givensupplemental ascorbate, |
| vitamin C. The point is to study the effect of vitamin C |
| on survival time (compared to controls) and whether this |
| effect depends on cancer type, sex, or age. | |
| The analyses will use the logarithm of the ratio of DAYS |
| to CONT, LRATIO = LOG(DAYS/CONT). Therefore, the initial |
| data step creates this new variable. Also, AGE is |
| redefined as deviation from the mean age, 64.3191, in the |
| data step. |
| Followup note |
-------------------------------------------------------------*/
options pagesize=50;
data cancer;
infile '/ncsu/st708_info/www/dickey/cancer.dta';
***************************************************************
infile allows you to access data from an ASCII file. If
the data were on drive a, you could say infile 'A:cancer.dta'
The data are at the end of this file, in case you cannot
access the data as shown here from your location.
*************************************************************** ;
input type $ sex $ age days cont;
lratio = log(days/cont);
age = age - 64.3191;
proc print data=cancer; sum age;
run;
/*-------------------------------------------------------------------------
| 1. LRATIO will be used as the response variable. Regard TYPE of |
| cancer and SEX of patient as two factors in a 3X2 factorial with |
| differing numbers of patients per treatment combination. That is, |
| we will treat the basic analysis as a 3X2 factorial in a completely |
| random design. The unequal numbers of patients makes these data |
| unbalanced so that PROC GLM must be used and LSMEANS must be used for |
| any comparisons. We will talk later in more detail about the analysis|
| of unbalanced data. |
| Do the appropriate factorial analysis of variance of LRATIO. Two |
| kinds of hypotheses are of interest: 1. Does supplemental vitamin C |
| increase the survival time of patients (i.e., is mean LRATIO > 0?) |
| and 2) is LRATIO affected by type of cancer or SEX of patient? |
| Here "affected" means through main effect or interaction. |
| Do appropriate tests of significance to test these hypotheses. |
| NOTE that the first hypothesis requires a test that the mean is |
| zero, not that the treatments have no affect. For this purpose, one |
| can use an ESTIMATE statement but care must be taken to make the |
| sure the quantity specified is estimatble. For example, the |
| statement: ESTIMATE 'MEAN' INTERCEPT 1; specifies a non-estimable |
| function of the parameters; i.e., the mean in an effects model is |
| not estimable. Since your model includes 3 TYPES, 2 SEXES, and 6 |
| TYPEXSEX interaction effects, the mean of a balanced data set will be |
| estimating mu + averages of all effects. Thus, the estimate statement|
| must be: |
| ESTIMATE 'MEAN' INTERCEPT 1 TYPE .33333 .33333 .33333 SEX .5 .5 |
| SEX*TYPE .166667 .166667 .166667 .166667 .166667 .166667; |
| but since this may involve roundoff error you are better off |
| specifying |
| ESTIMATE 'MEAN' INTERCEPT 6 TYPE 2 2 2 SEX 3 3 |
| SEX*TYPE 1 1 1 1 1 1/divisor=6; |
| Is the estimate the same as just the average of all the data? |
--------------------------------------------------------------------------*/
proc glm data= cancer;
class ...
model ...
estimate ...
/*-------------------------------------------------------------------------
| 2. Display the X matrix for your analysis of variance above. SAS |
| allows you to do this as shown below (same CLASS, MODEL statements |
| as GLM) Go left to right through this X matrix, circling the column |
| numbers for linear dependencies as they are encountered (the |
| interesting thing here is the relationship to degrees of freedom) For|
| example, if sex is the first factor in your model then column 1 is |
| the intercept and column 3 is the last sex dummy variable - the one |
| that you'd circle. |
-------------------------------------------------------------------------*/
proc glmmod;
class ...
model ...
/*-------------------------------------------------------------------------
| 3. Investigate the use of AGE of patient as a covariate in the anal- |
| aysis. First, run a covariance anaysis that will test for homo- |
| geneity of the AGE regression slope across (cancer type, sex) |
| combinations. State your conclusions from this test. |
-------------------------------------------------------------------------*/
proc glm data= cancer;
class ...
model ...
/*-------------------------------------------------------------------------
| 4. Regardless of the result of the test of homogeneity of regressions, |
| complete the analysis of covariance assuming a common regression. |
| Obtain adjusted treatment means for cancer type, sex, and all 6 |
| cancer_type*sex combinations. To what age are the means adjusted? |
| Was the covariate useful in increasing the precision of the analysis?|
| Interpret the results and compare your conclusions with those from |
| the simple analysis of variance. |
--------------------------------------------------------------------------*/
proc glm data= cancer;
class ...
model ...
/*--------------------------------------------------------------------------
| 5. Plot LRATIO against age using the plot symbol M for males and F for |
| females. Make another such plot in which the type of cancer (S, B, |
| C) as a plot symbol. |
---------------------------------------------------------------------------*/
/*-------------------------------------------------------------------------
| OPTIONAL |
| The variable LRATIO can be thought of as log(DAYS) - log(CONT). If |
| log(CONT) is moved to the right of the equality in a model, |
| it is evident that analysis of LRATIO is equivalent to analysis of |
| Y=log(DAYS) with log(CONT) being used as a covariate BUT with the |
| regression coefficient being forced to be 1.0. Use LDAYS=log(DAYS) |
| as the response variable and repeat question (3) above but include |
| both AGE and L(CONT)=log(CONT) as covariates. (This will require |
| another data step to create the two variables LDAYS and LCONT.) |
| Test the null hypothesis that the regression coefficient for LCONT |
| is equal to 1.0. Do the results of this analysis suggest that both |
| covariates are useful? If not which one would you drop and why? |
| Obtain the adjusted treatment means and interpret the results for |
| your final analysis. |
--------------------------------------------------------------------------*/
data cancer2; set cancer;
LDAYS = log(DAYS);
LCONT = log(CONT);
proc glm data=cancer2;
class ...
model ...
/* -- Here are the data in case the INFILE statement does
not work at your location:
cards;
S F 61 124 38
S M 69 12 18
S F 62 19 36
S F 66 45 12
S M 63 257 64
S M 79 23 20
S M 76 128 13
S M 54 46 51
S M 62 90 10
S F 69 876 19
S M 46 123 52
S M 57 310 28
S F 59 359 55
B M 74 74 33
B M 74 423 18
B M 66 16 20
B M 52 450 58
B F 48 87 13
B F 64 115 49
B M 70 50 38
B M 77 50 24
B M 71 113 18
B M 70 857 18
B M 39 38 34
B M 70 156 20
B M 70 27 27
B M 55 218 32
B M 74 138 27
B M 69 39 39
B M 73 231 65
C F 76 135 18
C F 58 50 30
C M 49 189 65
C M 69 1267 17
C F 70 155 57
C F 68 534 16
C M 50 502 25
C F 74 126 21
C M 66 90 17
C F 76 365 42
C F 56 911 40
C M 65 743 14
C F 74 366 28
C M 58 156 31
C F 60 99 28
C M 77 20 33
C M 38 274 80
************************************** */