homework 4 St 708
Homework 4 St 708
A real data example
Questions
/*
(1.) These are the data from chapter 5. Please read
through chapter 5, which describes one analysis of this
(real) data set.
Page xi of the preface tells you that the data are
located at
http://www.stat.ncsu.edu/publications/
Go there, click on Rawlings, and find the data set
LINTH-5.DTA
A. Get the data into the SAS program editor window then
create the appropriate data step to read it. I will
use the names as given on page 163. Using PROC CORR,
compute the correlation matrix as on page 164 of the text.
If there is any discrepancy, we will go with the data
off the internet.
B. Regress BIO (this is Y) on zinc Zn. Is the slope
significant? Compute R(pH|X0, Zn) and R(Zn|X0, pH)
where X0 stands for the intercept. Which of these
are significant? (use the MSE from the regression
with both of these variables) How is the R-square
from the simple linear regression of BIO on Zn
related to an entry of your correlation matrix?
C. Regress BIO on all the other available numeric
variables. Construct a 95% confidence interval for
the coefficient of Zn in this full regression.
(see Rawlings et al pg. 164-5)
D. Rawlings suggests (pg. 170) that BIO can
be modeled using just pH and K. By subtraction,
test the full model from part C against this reduced
model.
Repeat the same test in a single PROC REG by
using a test statement. Write down the K and M
matrices as in H0: K Beta - M = 0 (K is a matrix
here, not a symbol for potassium) that are being
used in this test assuming the variables are
entered in the order in the text:
intercept, Sal, pH, K (potassium), Na, and Zn.
Do the test a third time by appropriately rearranging
variables in your full MODEL statement and requesting the
type I sums of squares. Explain how the F test can be
computed from these Type I SS.
E. In the full model, table 5.2 gives the 5 estimated
coefficients. Suppose (just for illustration) I want
to test that the true values of these coefficients are,
respectively,
-35.0000, 300.0000, -1.0000, 0, and -18.0000
Compute the K and M matrices for this test as in
H0: K Beta - M = 0. (K is a matrix here, not a
symbol for potassium) Issue a test statement in
PROC REG to test this hypothesis.
F. Suppose (just for illustration) I want to test the
joint null hypothesis that the true coefficients of SAL
and Zn are the same as each other, the coefficient
of Na is 0, and the coefficient of K is -.5.
(i) Regress BIO + .5K on pH and some linear
combination of SAL and Zn with an intercept
so that this gives your reduced model. Compute
the F test by comparing this to the full model.
(ii) Compute the F test by writing your K and M
matrices, then issuing a test statement in
the full model to do the job.
G. Using the variable names as above, this program
creates the principal components of your data. Do prin1
and prin2 have interpretations (I did not see any neat
ones - do you? no right or wrong answer)
Plot prin1 vs prin2 using location as a plot symbol.
proc princomp out=outlin; var bio sal ph K Na Zn ;
proc gplot; plot prin1*prin2=loc/legend;
symbol1 v=dot i=none c=red;
symbol2 v=dot i=none c=green;
symbol3 v=dot i=none c=cyan;
Plot the first 3 principal components in a 3-D plot again
using color to denote location
DATA OUTLIN; SET OUTLIN;
IF LOC = "OI" THEN C = "RED ";
IF LOC = "SI" THEN C = "GREEN";
if loc = "SM" THEN c = "CYAN";
proc g3d; scatter prin3*prin2=prin1/
shape="balloon" size = .75 COLOR=c;
Turn in the plots. Any comment about whether the locations
form clusters in these plots? (again, no right or wrong, just
note that this is one kind of thing you might do with principal
components)
Optional: use SAS INSIGHT or JMP to create a rotating 3-D
plot. (nothing to hand in) .
(postponed prob. 4.3 to next time)
(2.) Practice with quadratic forms:
Suppose E is a column vector of 5 independent N(0,1)
random variables, that is, these are normally distributed
with mean 0 and variance 1.
Suppose also that V is this matrix:
13 6 -1 -1 - 1
6 4 2 2 2
V = -1 2 5 5 5
-1 2 5 5 5
-1 2 5 5 5
Is V symmetric? Find, if possible, a scalar multiple
m of V so that A = mV is idempotent.
(hint: Compute VV and compare it to V).
Let Y be the quadratic form Y = E'VE.
Find the expected value of Y.
Find a value c such that Pr{Y > c} = .05.
(hint: Express E'VE as Z'AZ where A=mV is idempotent)
*/