homework 4 St 708
 

Homework 4 St 708

A real data example Questions /* (1.) These are the data from chapter 5. Please read through chapter 5, which describes one analysis of this (real) data set. Page xi of the preface tells you that the data are located at http://www.stat.ncsu.edu/publications/ Go there, click on Rawlings, and find the data set LINTH-5.DTA A. Get the data into the SAS program editor window then create the appropriate data step to read it. I will use the names as given on page 163. Using PROC CORR, compute the correlation matrix as on page 164 of the text. If there is any discrepancy, we will go with the data off the internet. B. Regress BIO (this is Y) on zinc Zn. Is the slope significant? Compute R(pH|X0, Zn) and R(Zn|X0, pH) where X0 stands for the intercept. Which of these are significant? (use the MSE from the regression with both of these variables) How is the R-square from the simple linear regression of BIO on Zn related to an entry of your correlation matrix? C. Regress BIO on all the other available numeric variables. Construct a 95% confidence interval for the coefficient of Zn in this full regression. (see Rawlings et al pg. 164-5) D. Rawlings suggests (pg. 170) that BIO can be modeled using just pH and K. By subtraction, test the full model from part C against this reduced model. Repeat the same test in a single PROC REG by using a test statement. Write down the K and M matrices as in H0: K Beta - M = 0 (K is a matrix here, not a symbol for potassium) that are being used in this test assuming the variables are entered in the order in the text: intercept, Sal, pH, K (potassium), Na, and Zn. Do the test a third time by appropriately rearranging variables in your full MODEL statement and requesting the type I sums of squares. Explain how the F test can be computed from these Type I SS. E. In the full model, table 5.2 gives the 5 estimated coefficients. Suppose (just for illustration) I want to test that the true values of these coefficients are, respectively, -35.0000, 300.0000, -1.0000, 0, and -18.0000 Compute the K and M matrices for this test as in H0: K Beta - M = 0. (K is a matrix here, not a symbol for potassium) Issue a test statement in PROC REG to test this hypothesis. F. Suppose (just for illustration) I want to test the joint null hypothesis that the true coefficients of SAL and Zn are the same as each other, the coefficient of Na is 0, and the coefficient of K is -.5. (i) Regress BIO + .5K on pH and some linear combination of SAL and Zn with an intercept so that this gives your reduced model. Compute the F test by comparing this to the full model. (ii) Compute the F test by writing your K and M matrices, then issuing a test statement in the full model to do the job. G. Using the variable names as above, this program creates the principal components of your data. Do prin1 and prin2 have interpretations (I did not see any neat ones - do you? no right or wrong answer) Plot prin1 vs prin2 using location as a plot symbol. proc princomp out=outlin; var bio sal ph K Na Zn ; proc gplot; plot prin1*prin2=loc/legend; symbol1 v=dot i=none c=red; symbol2 v=dot i=none c=green; symbol3 v=dot i=none c=cyan; Plot the first 3 principal components in a 3-D plot again using color to denote location DATA OUTLIN; SET OUTLIN; IF LOC = "OI" THEN C = "RED "; IF LOC = "SI" THEN C = "GREEN"; if loc = "SM" THEN c = "CYAN"; proc g3d; scatter prin3*prin2=prin1/ shape="balloon" size = .75 COLOR=c; Turn in the plots. Any comment about whether the locations form clusters in these plots? (again, no right or wrong, just note that this is one kind of thing you might do with principal components) Optional: use SAS INSIGHT or JMP to create a rotating 3-D plot. (nothing to hand in) . (postponed prob. 4.3 to next time) (2.) Practice with quadratic forms: Suppose E is a column vector of 5 independent N(0,1) random variables, that is, these are normally distributed with mean 0 and variance 1. Suppose also that V is this matrix: 13 6 -1 -1 - 1 6 4 2 2 2 V = -1 2 5 5 5 -1 2 5 5 5 -1 2 5 5 5 Is V symmetric? Find, if possible, a scalar multiple m of V so that A = mV is idempotent. (hint: Compute VV and compare it to V). Let Y be the quadratic form Y = E'VE. Find the expected value of Y. Find a value c such that Pr{Y > c} = .05. (hint: Express E'VE as Z'AZ where A=mV is idempotent) */