Time Series homework 2
Homework 2
Regression with Time Series Errors
Questions
(note: Click on SAS Code at the bottom to see the initial
SAS code for this problem - copy and paste it into SAS then
modify as needed)
Data were supplied by meteorology graduate student Bill Barnard (also of
the USEPA). Data contain information on black carbon in the atmosphere
and PM-10 among other things. PM-10 is the amount of particulate matter
(PM) in the air that would be trapped by a filter with a certain pore size
(10 micrometers) I asked Mr. Barnard to explain the other items in the dataset .
In his words:
"... It is taken from the Southern California Ozone Study
that was done in 1997. It was a very heavily instrumented intensive study
done in the LA Basin. The data here is from Riverside, CA. DUV is an
integrated daily dosage of UV radiation in the 286-400 nanometer wavelength
region. TOMS is an acronym for Total Ozone Mapping Spectrophotometer that
NASA uses on the NIMBUS satellite to measure,among other things, the
stratospheric ozone levels all over the earth. It is in polar orbit. It
gives only one daily value for just about any location almost every day.
Black carbon is just what you said. Its units are micrograms/cubic meter.
PM-10 is the same thing as your statement. Particulate matter, 10
micrometers or less in diameter that is collected on an hourly basis. The
ozone value is a measurement of the ground-level ozone pollution in parts per
million. "
- The data have used so-called Julian dates. The Julian date for Feb2 of
1998 is 1998033 because Feb. 2 is the 033 day of the year. Sometimes
just the day of the year (033) rather than the whole thing is recorded. That is
the case here, but we know the year is 1997.
Create a SAS date variable with format date7. Print out the first 5
observations from your dataset with a nice descriptive title.
- Plot all the meteorological variables versus your nicely formatted date.
You can put each plot on a different page, but if you want to learn more about
SAS you can Use a Template
- Run a correlation among all the meteorological variables. What other
variables are highly correlated with DUV, the response of interest?
Use PROC CORR in SAS.
What assumptions are usually made when p-values are computed for
correlations? Ignoring the normality assumption, what other assumption is
likely to be violated when data are taken over time like this?
- Regress DUV on the other meteorological variables using PROC REG.
Output the residuals r. Plot r against Lr=lag(r), i.e. r against its lagged value.
You might try a gplot here along with
SYMBOL1 V=DOT I=R C=RED;
Check the SAS log window here. I=JOIN connects points with lines in the
order encountered and I=NONE is obvious, but what does I=R do??
Make a histogram of the residuals. PROC GCHART or PROC CHART
would be good choices here.
- Regress r on 3 of its lags. Assuming that the regression statistics are OK in
large samples (see Fuller's text for a proof that they are, under rather mild
assumptions) give the F test to see if you can leave our all but the first lag.
- Regardless of (5), regress r on Lr where Lr is the lag of r and,
again assuming the test statistics are valid, discuss the statistical
significance. This is one way to estimate an AR(1) structure for
the residuals. Regardless of significance here, write down (using say 3
decimal accuracy) the estimated 4x4 Toeplitz covariance matrix
of any set of 4 contiguous residuals using your estimated AR(1) structure.
What is your conclusion about the regression in part (4)? Specifically, are
your estimated coefficients unbiased? Can I trust the standard errors? Can
I trust the p-values for my test statistics?
- Rerun the regression from part (4) changing PROC REG to
PROC AUTOREG . Ask for teh Durbin-Watson
statistic P-value. What is the DW statistic, what is its P-valeu and what
is the implication?
At the end of your MODEL statement, before the semicolon, put
/ NLAG=3 BACKSTEP
This will fit 3 lags to r then using t tests, eliminate the insignificant lags (BACKSTEP).
How does the initial regression compare to PROC REG?
The procedure has used the estimated autocorrelation structure to fit a generalized
least squares GLS, or more specifically an estimated GLS - EGLS, regression.
Summarize for the client how the coefficient and p-value on black carbon (the focus
of his thesis) changes when you correct for autocorrelation like this.
- Using t test statistics, eliminate one at a time, starting with the least significant,
the terms in the model that are not significant at the 10% level. Use PROC AUTOREG.
Just hand in a summary indicating which term was omitted at each step, and its
EGLS p-value. I believe you will find a model that contains black carbon (and maybe other
things), but its p-value is not less than 0.05.
Now the presence of black carbon in the atmosphere could not possibly increase
the amount of radiation DUV. It could only decrease. Explain how prior knowledge
of this fact could be helpful to our client who is interested in showing an effect
of black carbon on DUV. Use the final model you got by the model fitting you just did.
Short questions:
- Here are some theoretical autocorrelations. Give the AR, MA, or ARMA model
that would give these autocorrelations
Lag 0 1 2 3 4 5 6 7 8 9 10
Model I 1 .5 .25 .125 .0625 .03125 .....
Model II 1 0.2 0 0 0 0 .....
- A moving average order 1 model has mean 90 and error variance Var( e(t) ) = 100.
The model is Y(t) - 90 = e(t) - .8 e(t-1). My last two observations are Y(99) = 105
and Y(100) = 98. I do not care about the next observation, Y(101), but I do want to
predict the average ( Y(102)+Y(103)+Y(104)+Y(105) )/4.
- What would be the best predictor (BLUP) of this average of future values, assuming
all the given model parameters are known values, not estimates?
- Find the variance of an individual Y value ( Gamma(0) )
- Find the variance of the mean of 4 values given above - note that they are NOT uncorrelated
with each other.
- Find the correlation of that mean of 4 with each past data value Y(1), ...,Y(100) and if
needed, go back and correct your answer to the first part!
SAS Code
SAS Online Documentation