homework 2 St 708
Homework 2 St 708
/* ---------------------------------------Due Sep. 9 2003-----------
| This exercise continues to review linear regression, illustrates |
| projection ideas, and gives practice with more IML matrix tools. |
--------------------------------------------------------------------
Questions
1. The SAS IML function DET(M) returns the determinant of M. Using this
function, compute the determinants of the matrices B and C below:
1 3 2 5 1 3 1 5
1 2 5 3 C = 1 2 0 3
B = 1 -1 2 2 1 -1 2 2
5 2 -1 4 5 2 -6 1
What does this information tell you about these matrices?
The statement D = inv(A); returns D as the inverse of matrix A. Use this
to compute the inverses of B and C if possible.
If either B or C is not full rank, find a nontrivial column vector V such that
BV (or CV) is a column of 0s. Is there ANOTHER nontrivial V that will work and
which is not just a multiple of your first V? How do you know? ("trivial"
means all 0 entries).
As a review of some matrix operations, pick out the middle 2 columns of
B and stack them on top of the middle 2 columns of C to get a new 8x2 matrix.
Output this to a dataset and plot the first column against the second.
Class demo programs IML_commands.sas and MAT2DAT.sas should have most
of the commands you'll need. Submit the plot and the IML code you used to
do the job.
2. Find, if possible, values a,b,c and d such that all four of these
equations are satisfied:
a + 3b + 2c + 5d = 10
a + 2b + 5c + 3d = 3
a - b + 2c + 2d = -1
5a + 2b - c + 4d = 11
Explain how knowing the inverse of matrix B from problem 1 can help you
in solving these equations and show how to program the solution
in SAS PROC IML.
3. Fill in the missing entry of matrix D in such a way that D
has no inverse. How many right answers are there to this problem?
7 4 6
D = 2 2 12
4 5 __
-1
4. Compute the matrix P = X (X'X) X' for the matrix
1 -2
1 -1
X = 1 0
1 1
1 2
Check to see if P is idempotent. Show how we can use a quick method
from the book (or class lecture) to find the rank of P without knowing
its relationship to X. [Note: Because we know how P and X are related, we
see that every column of P is a linear combination of the two X columns
so we know from that fact that rank(P) = 2 and we know that P is idempotent.
I am asking you to check these as though you did not know about X.]
Write down any column vector with 5 entries, calling the vector Y. Compute
PY, PPY, PPPY, and PPPPY. Are the elements of PY "equally spaced" (in other
words is the difference between the first and second entry the same as
the difference between the second and third, third and fourth, etc. )
If I run a simple linear regression (with intercept) of your column of 5
original Y values on the intercept column and a column X =(1 2 3 4 5)',
take the residuals R, and then project them with P, what will be the
resulting PR? (hint: You can do the computation but you should know
without doing it).
5. Show that the matrix X below is a projection matrix.
| .5 0 0 0 .5 |
| 0 .5 0 .5 0 |
X = | 0 0 1 0 0 |
| 0 .5 0 .5 0 |
| .5 0 0 0 .5 |
Compute the projections of the vectors V1 = {1 1 1 1 1 }' and
V2 = {1 3 3 3 1}' and V3 = {1 2 3 4 1}'.
Find, if possible, scalar constants a and b such that the projection of
the vector aV1 + bV2 is not just aV1 + bV2.
Is the space spanned by V1 and V2 equal to the space into
which X projects vectors? Is it a subspace? How do you know?
Is the space spanned by V1 and V3 equal to the space into
which X projects vectors? Is it a subspace? How do you know?
6. Here are some data from problem 1.4 page 30 and a simple
PROC REG in SAS. Y is resting heart rate and X is body weight
in kilograms:
DATA HEART;
INPUT X Y; CARDS;
90 62
86 45
67 40
89 55
81 64
75 53
;
PROC GPLOT; PLOT Y*X;
symbol1 v=dot c=red i=RL;
PROC REG; MODEL Y=X/P R XPX I COVB;
********************************************
** note how PROC REG borders the X'X matrix
with X'Y, Y'Y etc. and simliarly b and
SSE form a border for the inverse of X'X
This is part of the sweep operator. Look
at the log window where the equation of
the line in the plot is shown.
*******************************************;
RUN;
*** Now we try this in IML ***;
PROC IML; reset spaces=5;
X = { 1 90, 1 86, 1 67, 1 89, 1 81, 1 75};
Y = {62, 45, 40, 55, 64, 53} ;
XPX=X`*X; XPY=X`*Y; IXPX =inv(XPX); b= ;
print X Y XPX XPY;
ident = I(6);
print IXPX b ident;
questions:
(a) Add the computation of the estimated parameter vector b to
the program and verify that vector b gives the regression
coefficients.
(b) The error sum of squares is Y`Y - b`X`Y in matrix form
and it has n-2 degrees of freedom. Add the computation of
the error MEAN square MSE to the IML program.
(c) Verify that if you multiply the inverse X'X matrix, IXPX,
by MSE you get the COVB matrix from PROC REG.
(d) Verify that if you take the square roots of the diagonal
elements in COVB you will get the "standard errors" delivered
by PROC REG.
(e) Compute, in IML, the projection matrix P for this regression
problem. Compute PY and verify that this delivers the same predicted
values as PROC REG.
(f) Compute the matrix I-P for this regression and show that
(I-P)Y is the vector of residuals delivered by PROC REG.
Also compute and print Y'(I-P)Y. Does this appear on your
PROC REG output somewhere?
7. Optional (not graded) : For the matrix A below, write out
a degree 4 polynomial whose roots are the eigenvalues of A.
A = {4 0 2 1,
0 4 1 0,
2 1 6 2,
1 0 2 2};
I hope you will do this by hand, but the program below can be used
to check your answer (and it contains some interesting SAS code).
Note: In the following program, the determinant function is used to
compute f(L) = | A - L*I | for a sequence of Ls. Note that I(4) is
a 4x4 identity matrix and note the method for passing matrices into
SAS datasets. Also V must be initialized so it exists within the DO
loop, hence the shape function.
Next the program uses PROC GLM to find the polynomial equation for
f(L) Hence this can be used to check your algebra.
=====================================================================
PROC IML;
A = {4 0 2 1,
0 4 1 0,
2 1 6 2,
1 0 2 2};
V = shape(0,8,2); ** Dummy matrix of 0;
do L = 1 to 8; V[L,1]=L; V[L,2]= det(A - L*I(4)); ** I(4) = identity;
end; ** note {} matrices
[] options or elements
() functions;
create data1 from V [colname={L FL} ];
append from V; ** read V into dataset;
proc glm; model FL= L L*L L*L*L L*L*L*L L*L*L*L*L;
run;
=====================================================================
Some IML functions:
C = Det(M) (scalar C is determinant of M)
V = shape(0,8,2) (V is an 8x2 matrix of 0s)
create & append (form a dataset from a matrix -
colname matrix contains variable names)
ID = I(4) (ID is a 4x4 identity matrix)
B = inv(A) (B is inverse of A)
+ - * (add subtract multiply)
C = A*B (ordinary product OR scalar product if A or B scalar)
(can also have A +/- B with B scalar)