Homework 1 Due Friday Jan. 18 Problems 2, 3, 4, and 8 pages 17-19 of Fuller. For problem 2, recall that the t distribution with d degrees of freedom has density f(x) = C(1 + x*x/d)**(-(d+1)/2) where C is chosen to make the area 1 and ** denotes exponentiation. Add these parts to the question: (b-i) Compute the integral from 1 to n of 1/x. (Recall (d/dx)ln(x) = 1/x ) (b-ii) Discuss the convergence of the integral from 1 to infinity of 1/x. (b-iii) Is the function |x|f(x) integrable (over the real line)? now answer problem 2b using what you' ve shown. optional (not graded) compute the constant C in f(x) and give another name that is often associated with this distribution. ---------------------------------------------------------------------------- Homework 2 Due Monday Jan. 28 I. A realization of a time series of 100 consecutive Y values has average 85.63 where Y is the percentage of sulphur removed by a "scrubber" from the smoke coming out of a coal burning furnace. Y is assumed to follow the moving average model Y(t) = mu + e(t) + .8 e(t-1) where e(t) is independent white noise wiht mean 0 and known variance 25. (a) Compute the VARIANCE of the mean of 100 consecutive Y values (b) Compute a 95% confidence interval for mu. (c) The furnace operator will be fined if the average of 100 consecutive Y values falls below 82% removal. If he is trying to cut it close and run at a long term mean mu=83.5 percent removal, what is the probability that he will be fined based on one average of 100 consecutive observations? II. Work problems 2 and 4 on pg. 101, 102 of Fuller. ------------------------------------------------------------------------ Homework 3, due Wed. Feb. 6 1.A city has a waste treatment plant that can hold 9100 units of liquid waste. Let Y(t) represent the amount in the plant on day t. In going from day t-1 to day t the following happens. First, a proportion 1-p of what was in the plant at day t-1 gets filtered and leaves the plant. Then a random amount 900+e(t) enters the plant where e(t)~ N(0,684). Suppose the plant is empty to start with then after the first day it would have 900+e1 units and after the second it would have previous-filtered+new= (900+e(1))-(1-p)(900+e(1))+(900+ e(2)) etc. (a) To what do the mean and variance converge after a long time has passed? The answer may involve p, of course. Hint: E{Y(1)}=900, E{Y(t)}=pE{Y(t-1)}+900, t>1 (b) Using the above, approximate the probability that a randomly chosen day in the distant future will have waste that overflows the plant assuming p is .9 (10% gets filtered each day). (c) In the distant future, what would be the mean and variance of the day-to-day changesY(t)-Y(t-1)? [For those interested in spatial statistics, the variance of Y(t)-Y(t-j) is related to a quantity called the "variogram" at lag (or distance) j. ] (d) One day in the distant future, the plant gets to 9090 units of liquid waste. Conditional on this fact, what is the probability of an overflow the next day? (e) Take a look at the process by executing the following SAS code: DATA WASTE; Y=0; P=.9; DO DAY=1 TO 100; NEW=900+SQRT(684)*NORMAL(0); FILTERED=(1-P)*Y; Y=Y+NEW-FILTERED; OUTPUT; END; PROC GPLOT; PLOT Y*DAY/VREF=9100; SYMBOL1 V=NONE I=JOIN C=RED; 2. (a) Compute the first 5 partial autocorrelations of the series Y(t) =(1 -1.2 B + 0.38 B**2)e(t). (b) Can e(t) be expressed as a convergent infinite weighted sum of past observations Y(t)? If so what are the weights for Y(t-1), Y(t-2), and Y(t-3) ? (c) We talked about running "population regressions" as part of our method for computing partial autocorrelations. In the population regression of Y(t) on Y(t-1),Y(t-2), and Y(t-3) what are the 3 (population) regression coefficients? You will proably want to use SAS PROC IML with the class notes as a guide. ------------------------------------------------------------------------ Homework 4, due Fri. Feb. 15 Work problems 5, 8 (Assume this system can involve other gammas as well, that is, I assume he is not asking us to algebraically solve for gamma(0)), 9, 15, 16, 24 page 102-106 of Fuller. Also: The series Y(t) = e(t) - e(t-1) is not invertible. Nevertheless, there would be a best predictor of Y(4) based on Y(1), Y(2), and Y(3) namely -.75Y(3) - .50Y(2) - .25Y(1) because | 2 -1 0 | | -.75 | | -1 | |-1 2 -1 | | -.50 | = | 0 | | 0 -1 2 | | -.25 | | 0 | (A) Compute the prediction error variance for this predictor. (B) Consider b(j) = -(n+1-j)/(n+1) j=1,2,...,n and the predictor b(1)Y(n) + b(2)Y(n-1) + ... +b(n)Y(1) Show that b(j) satisfies the nxn version of the above equations. (C) You have proved that the b(j) sequence gives the BLUP of Y(n+1). Compute the prediction error variance and its limit as n approaches infinity (in terms of the variance of e(t)) ------------------------------------------------------------------------------ Homework 5 Due Monday Feb. 25 Work problem 2 page 300 of Fuller Also: I have a linear trend regression model Yt = 10 + 3 t + et where the et are N(0,1) random variables. Let a and b be the estimates of 10 and 3 (the parameters) from an ordinary least squares regression of Yt on t using n data points (t=1,2,...,n). (A) Compute the limit of this Riemann sum : Sum of (t/n - (n+1)/(2n) )(t/n - (n+1)/(2n) )(1/n) Find the order (O, not Op ) of the sum of squared (t - (n+1)/2 ) values relating your answer to the Riemann sum result. Also: what is the mean of t in terms of n? Hint: Add the n and the 1, the n-1 and the 2, the n-2 and the 3 etc. then divide by n. (B) Write down the (St 511-512) formulas for the variance of a and of b. Give the strongest possible Op statements for these a = Op( ) (a-10)=Op( ) b = Op( ) (b-3) = Op( ) (C) Using the original uncentered regression, suppose I predict Y(n+1) = 10 + 3 (n+1) + e(n+1) using Yhat(n+1) = a + b(n+1) as usual. Complete the following with the strongest possible Op( ) statement: Y(n+1)- Yhat(n+1) = e(n+1) + Op( ) (of course this result would have to be the same whether or not you center the regression). --------------------------------------------------------------------------- * Homework Due Wed. April 3 (after Easter break) recall: Test next Monday We will do a simulation to check out the estimation results we've shown thus far. Here is a SAS program to generate 50 observations from an AR(2) model and get some parameter estimates. It generates 2 such sets. The number 1234567 is called a "seed" and is only used on first encounter of the random number generator. Let's all use this same seed to make the grading easier. Also sig sets the standard deviation of e(t) and I will ask you to change the values of C1, C2, and C3 later; Data St782; sig=1; C1=1; C2=1; C3=1; n=50; drop sig C1 C2 C3 n; do rep=1 to 2; Y2 = sig*C1*normal(1234567); Y1 = C2*Y2+ sig*C3*normal(1234567); do t=1 to 50; Y = 2 + 1.2*Y1 - .32*Y2 + sig*normal(1234567); output; Y2=Y1; Y1=Y; end; end; proc print data=st782(obs=52 firstobs=48); proc reg outest=parms /*noprint*/; model Y=Y1 Y2; * by rep; where rep=1; proc print data=parms; run; *(1) Compute the theoretical mean and first 4 autocovariances of the process. Compute the roots r1 and r2 of the (theoretical) characteristic equation. Explain what Y2=Y1; Y1=Y; is doing. (Do you remember why we can use C1 C2 etc. after they are "dropped"??) (2) Regress the theoretical autocovariances h=1 through h=4 on columns r1**h and r2**h with no intercept. Is the fit exact? How does this relate to difference equations that we studied earlier? (** is exponentiation 2**3=8, e.g.) (3) How far apart would two observations have to be in order for the correlation between them to drop below 0.5? What is the correlation between observations that far apart? (4) You can see that in rep 2, the series restarts near 0. That is because the variance-covariance structure of the first two observations is not right when all the Cs are set to 1, and the mean is not right for them either. Motivated by this, compute C1, C2, and C3 so that sig*C1*normal(1234567) and C2*Y(t)+ sig*C3*normal(1234567) produce 2 consecutive Y(t) values that both have variance Gamma(0) and have covariance Gamma(1), i.e. have the proper variance structure. You'll also have to start with the right mean so put in your theoretical mu value by adding statements Y2=Y2+mu; Y1=Y1+mu; just before you enter the loop. For the homework assignemnt, just tell us what you use for mu and the C values. From here on, use the adjusted program so that you are generating stationary series from the start. (5) Change the reps to 1000 and use the proper mean and covariance structure for the first observations. Make histograms of the 1000 values of the two AR parameter estimates. What are the theoretical means and variances (approximate, based on large sample theory) for these? What are the empirical means and variances based on your 1000 independent samples? For rep 1, what are the reported standard errors for these two AR parameter estimates? According to large sample theory, can these be trusted? What are the theoretical values to which these standard errors correspond? (6) Run PROC CORR on your 1000 estimated (Alpha1, Alpha2) pairs (AR parameter estimates). Theoretically, what correlation do you expect to see based on large sample theory? (7) Test for finite sample bias (n=50) in each of the two estimators. Justify your use of the normal distribution to do the tests. (8)Fuller, page 55, gives Gamma(0) as a function of the innovations standard deviation (sig) and the AR parameters. Estimate this function for each of your samples by plugging the sample estimates of sigma and the AR parameters into the formula. Using these 1000 numbers, give an approximate 95% confidence interval for the expected value of this estimator (nothing fancy - just average your 1000 numbers and appeal to large sample theory to put an interval around the average). Is the theoretical Gamma(0) in your interval? (9) If you changed the innovations standard deviation (sig) from 1 to 5, what would be the effect on the AR parameter estimates? Make some brief arguments to support your answer (not looking for just another computer run although you can certainly check it out this way). (10) (Optional - not graded) Engle (1982) proposed a model called ARCH - AutoRegressive Conditionally Heteroscedastic - that models the innovations variance as a function of previous errors. In this model, the variance of e(t) might be 1+.5e(t-1)*e(t-1), for example where 1 and .5 would in general be parameters to estimate (see prob. 43 page 110) Modify the program to produce ARCH errors by keeping track of the previous e(t), call it e1 in the program, and using it to make sig=sqrt(1+.5*e1*e1). Now check out the effect on your AR parameter estimates. ; --------------------------------------------------------------------------- Homework: Consider the AR(2) model Y(t) - Mu = (Rho+Alpha)(Y(t-1) - Mu) - Rho Alpha( Y(t-2)-Mu) + e(t) under the null hypothesis H0:Rho=1. Assume throughout that |Alpha|<1 and for simplicity that Mu=0. We have suggested regressing Y(t)-Y(t-1) on Y(t-1) and [Y(t-1)-Y(t-2)], with no intercept, to test H0. The result was that the coefficient on Y(t-1) needed to be modified to use the unit root tables, but the t test could be used as computed. Consider these alternate ways to fit the model: Method 1: Regress Y(t)-Y(t-1) on [Y(t-1)-Y(t-2)] to get "a," an estimate of Alpha. Next, Regress Y(t)- a Y(t-1) on [Y(t-1) - aY(t-2)]. Use n(coeff-1) as a test where coeff is the regression coefficient in this second stage regression. Method 2: Regress Y(t) on Y(t-1), Y(t-2), write down the estimated characteristic polynomial, and get the largest root r. Compute n(r-1) as a test. For method 1, normalize the numerator (1/n) and denominator (1/n*n) of (coeff-1). Show that these normalized sums of squares and cross products are the same in the limit using a as they would be using the true Alpha and hence that n(coeff-1) converges to the distribution without any adjustment. (note: Under the alternative, this may or may not give good power) For method 2, write the estimated characteristic polynomial as a function of the two coefficients in the regression of [Y(t)-Y(t-1)] on Y(t-1) and [Y(t-1)-Y(t-2)]. We know something about their distribution. Now write the estimated largest root r (the one that uses the + sign) by applying the well known algebraic formula for the roots of a quadratic. Finally, expand this function of the estimated regression coefficients about the true coefficients using a Taylor series. For this problem, you can assume the remainder (after the two first partials) can be ignored, but of course that fact should also be proved. Using the linear terms of the expansion, compute the limit, noting what if any adjustments to n(r-1) are needed to make it converge to the distribution in the back of our text. (note: studentized versions of these statistics would also be of interest) Finally, let us look at passenger loadings at Raleigh-Durham airport. The data are in our class locker RDU.sas. We will analyze the data prior to the Sept. 11 incident. Regress the differenced data on time, lag Y, and 8 lagged differences. Report the unit root test and the decision you make with it. Explain, as you would to a researcher, why you would like to have a parsimonious model to do the testing. Make note of the loss of data due to lagging as well as the failure of the lag y column to be orthogonal to the others (unlike what the limit theory suggests). Give the F test for testing to see if you can leave out all the lagged differences. What do you conclude? Can you compare F to an F distribution here? Explain. Now pare down on the model by leaving out lagged differences. Now that you have, you hope, improved the power, what does your unit root test suggest? Do we have a random walk with drift or do we have stationary fluctuations around a linear trend? (note: If you have deterministic seasonal effects, you can also throw in seasonal dummy variables and it will not affect the limit distribution for the unit root tests - these effects are op(1) under H0. There is some indication of seasonal effects here and the inclusion of seasonal dummies will affect the results. Of course there is the usual tradeoff - inclusion of unnecessary terms depletes your power while failure to add needed terms invalidates the test