Applied Time Series Homework 2

STATISTICS 730: APPLIED TIME SERIES ANALYSIS, Fall 2004, Professor Dickey

HOMEWORK 3: Yule Walker Equations and Forecasts

DIRECTIONS: Complete the following problems below. Show all of your work to receive credit. Make sure to include output from the SAS output window and graphs from the SAS graphics window WHEN YOU NEED TO SUPPORT YOUR ANSWERS!!!

For this homework, written in strict ASCII text, parentheticals will characterize subscripts; for example, Y(t) denotes "Y sub t". Moreover, the autocovariance function at lag h will be denoted as G(h). This assignment appears much longer than it really is because it contains a lot of explanation.

/***********************************************************/
[PART 1]

In this problem, we will work with the model Y(t)-100 = .8 [Y(t-1)-100] + e(t) + 1.2 e(t-1) + .35 e(t-2); where e(t) is a white noise process. Suppose the variance of the e(t) series is 144; i.e. it is a known value.

[1: Wold representation] We have talked about expressing time series processes in terms of white noise (shocks), this is known as the "Wold representation" for the series. Write the first 4 terms of the Wold representation: Y(t)-100 = e(t) + ___ e(t-1) + ___ e(t-2) + ___ e(t-3) + ...

[2.1] Multiple both sides of the model by Y(t)-100 and take the expected value. I get G(0) = .8 G(-1) + ___; where G(h) is the autocovariance function at lag h. Fill in the blank (hint: make use of the Wold representation).

[2.2] Multiply both sides of the model by Y(t-1)-100 and take the expected value. Do the same for Y(t-2)-100. You now have 3 equations and 3 unknowns; namely, G(0), G(1), and G(2). Write the system of equations down. Solve for the G(h) values. G(0)=___, G(1)=___, G(2)=___. From these autoCOVARIANCES, compute the corresponding autoCORRELATIONS.

[2.3] What can be said about G(h) for h>2? {Note: you have now written down and solved the "Yule-Walker" equations for this model.}

[3: Forecasting] Suppose Y(150)=125, e(150)=20, and e(149)=10. Recall the variance of the e(t) series is known to be 144.

[3.1] Forecast Y(151).

[3.2] Compute upper and lower 95% prediction limits assuming normality.

[3.3] Similarly, forecast Y(152) and compute upper and lower 95% prediction limits. (Hint: Basically, everything is known up through t=150. Look at the Wold representation. If everything is known up to time t-2, what would remain unknown? What is the variance of this prediction error?)

[3.4] Using the hint in [3.3], what is the covariance between the one and two step ahead prediction errors?

[3.5] Write a SAS (or other) program that starts with Y(n)=125, e(n)=20, e(n-1)=10, then generates Y(t) = 100 + .8 [Y(t-1)-100] + e(t) + 1.2 e(t-1) + .35 e(t-2) for t = n+1, n+2, ... n+10 assuming e(t)=0 for t>n. This is the basic recursion used by time series programs to get forecasts into the future. Plot Y(n+L) versus L to see what the forecasts look like.

[3.6] Suppose someone asked you to predict observation 200. Because t=200 is so far into the future, you could easily give a very accurate approximation to the prediction and prediction error variance. Briefly explain.

[4] There are loads of impractical assumptions in question [3]. At best, you would have estimates of the model parameters and historical prediction errors e(t), possibly being computed as some sort of residual. One question is whether the forecast residuals will be good estimates of the true errors. Let B denote the usual backshift operator; where, B( Y(t) ) = Y(t-1), B(B( e(t) )) = e(t-2), and B(100) = 100 (the mean). Let ** be exponentiation as in most computer programs. Our model is (1 - .8B) [Y(t)-100] = [1 + 1.2B + .35B**2] e(t).

[4.1] Recall: 1/(1-X) = 1 + X + X**2 + X**3 + X**4 + ... for |X|<1. Divide the model on both sides by (1 - .8B), using .8B in place of X, show that this formally reproduces the Wold representation (check the first 4 terms against question [1]).

[4.2] Express 1/[1 + 1.2B + .35B**2] as A/[1 + .7B] + C/[1 + .5B] where A and C are constants. This is known as the method of partial fractions.

[4.3] Using the results from [4.2], show that the backshift ratio operator R(B) = (1 - .8B)/[1 + 1.2B + .35 B**2] expands into an infinite series with coefficients that decay exponentially. This means that you can essentially extract the e(t) series from the Y(t) series if you know the parameters. You would compute e(t) = R(B) [Y(t)-100]. (You should understand where this is coming from). Compute the first 4 terms in this expansion of e(t) as a weighted average of current and past Y(t) values: e(t) = ___ [Y(t)-100] + ___ [Y(t-1)-100] + ___ [Y(t-2)-100] + ___ [Y(t-3)-100] + ...

Note: The "convergence" of this set of weights is determined, as you can see, by the location of the roots of the moving average backshift operator. Because the roots of [1 + 1.2B +.35B**2] are -2 and -10/7, both > 1 in magnitude, the series is said to be "invertible." This invertibility is critical as we need to get some estimates of e(t) values when the model contains moving average terms. See the class notes.

/***********************************************************/
[PART 2]

[1: Simulating a series] Run the following SAS program. Note the use of some throwaway observations to eliminate transient startup effects.

DATA SIMARMA;
  E1=0; E2=0; Y1=100;
  DO DATE = -20 TO 400;
    E = 12*NORMAL(123);
    Y = 100 + .8*(Y1-100) + E +1.2*E1 +.35*E2;
    IF DATE>0 THEN OUTPUT; E2=E1;E1=E; Y1=Y;
  END;
RUN;

[1.1] What is the purpose of E2=E1;E1=E; Y1=Y? Why does E get multiplied by 12?

[1.2] What are the first and last dates of the data? (Note: 57 is NOT a date. Feb 1, 2001 IS a date). Using your theoretical variance G(0), plot Y(t) versus date with horizontal reference lines at 100 plus and minus 2 standard deviations.

[1.3] Run the following code: PROC ARIMA; I VAR=Y; E P=1 Q=2 ML; F lead=3; Write down the estimated AR and MA coefficients and their standard errors. Are any of the coefficients within 2 standard errors of 0? Are any of them more than 2 standard errors from their true values? Write down the first 4 estimated autocovariances and autocorrelations next to their true values.

[2: A real data set] In business applications, data sets are often created and transmitted between third parties (i.e. consultants) using Microsoft Excel. In this problem, you will learn how to tell SAS to communicate with Microsoft Excel. This is very typical of real world data sets - there are a lot of obstacles to overcome. Our interest is in the quarterly GDP (in billions of chained 2000 dollars) series.

[2.1] Go to the "Other datasets" at the bottom of our course home page. Save the GDP data set (Gross Domestic Product quarterly and yearly) as an *.xls file. Look at the data in EXCEL. Is there a column of blanks somewhere in there? On what line is the first numeric entry? Nothing to hand in here.

[2.2] Within, Microsoft Excel save the *.xls file as a *.txt file (TAB delimited file). For example, go to "File -> Save As ... -> Save as Type: Text (Tab delimited) *.txt". Using the infile statement in SAS, read in the data.

For example, infile "A:GDP.txt" firstobs=4 dsd dlm="09"x missover; This is an infile statement that reads starting on the 4th line, using the TAB delimiter (dlm), and correctly interprets a double tab as a missing value (dsd). You may will need to modify it appropriately.

Consider the following input statement: input year x1 $ x2 $ X3 $ date yyq6. x4 $ GDP $; This would read a numeric year, three character variables X1 X2 X3, the quarterly date, a character variable X4, and the quarterly GDP that we want to analyze. After modifying the infile statement appropriately, try this. Did it work? What is the purpose of variable X3?

Next, note that we cannot use GDP as a character and we thus want to read it as numeric variable, but it has commas. Use the comma8.1 format instead of the dollar sign for GDP. What happens? Try adding a colon as follows GDP : comma8.1 The colon tells SAS to skip to the first non-blank entry then apply comma8.1. Nothing to hand in here.

[2.3] Compute the natural log of GDP and the first difference of ln(GDP) and plot each against a nicely formatted date. Use PROC GPLOT and connect the points. This code may help. LGDP = log(GDP); DGDP = dif(LGDP); keep date DGDP LGDP; if gdp=. then delete;

[2.4] Using PROC ARIMA, compute the autocorrelations of the log transformed GDP and its first differences. Also plot both series (logs and the difference of logs). Comment on your visual impression of the stationarity of these series.

[2.5] The rate of growth of GDP is of interest to economists. Explain why, then, they would be interested in the difference of log transformed GDP and its mean.

/***********************************************************/
[PART OPTIONAL]

OPTIONAL: If you like, you can try importing that data with the SAS import wizard: "File -> Import Data".

OPTIONAL: Give the SAS code for importing an excel document directly into SAS. Do you need to manually format the excel document prior to its importation into SAS?

OPTIONAL: Still interested in analyzing time series data? Check out the following website: Time Series Data Library.

/***********************************************************/
[APPENDIX TO HOMEWORK 3]

Turn in your COMPLETE SAS program from the enhanced editor window.

/***********************************************************/
[LINKS]

SAS Online Documentation