
To introduce students to statistical
models and methods for the analysis of longitudinal data, i.e. data
collected repeatedly on individuals (humans, animals, plants, samples,
etc) over time (or other conditions).
(
Return to top)
Course prerequisites
ST 512,
Experimental Statistics for Biological Sciences II, or equivalent.
Thus, students should be familiar with basic notions of probability,
random variables, and statistical inference; analysis of variance; and
(multiple) linear regression. Familiarity with matrix algebra is also
useful. We will review matrix algebra at the beginning of the course
and make considerable use of matrix notation and operations
throughout. ST 512 involves the use of the SAS (Statistical Analysis
System) software package; thus, students are expected to have had some
exposure to the use of SAS. The course is meant to be accessible both
to non-majors and majors. The underlying mathematical theory will not
be stressed, and the main focus will be on concepts and applications.
Please see the instructor if you have questions about the
suitability of your background. (
Return to top)
Course topics
-
Preliminaries: Introduction,
Review of matrix algebra, random vectors, multivariate normal
distribution, review of linear regression
- Classical methods for normally
distributed, balanced repeated measurements: Univariate
repeated measures analysis of variance, Multivariate repeated measures
analysis of variance, Drawbacks and limitations of classical methods
- Methods for normally distributed, unbalanced
repeated measurements:
General linear models and models for
correlation, Random coefficient models, Linear mixed effects
models, Population-averaged vs. Subject-specific modeling
- Methods for non-normally distributed, unbalanced data:
Probability models for discrete and continuous nonnormal data and
generalized linear models, Generalized estimating equations for
population-averaged models
- Advanced topics (quick overview):
Generalized linear mixed effects models, Nonlinear mixed
effects models, Missing data mechanisms
See the class notes below for more detailed information
(Return to top ).
Tentative syllabus
(
Return to top)
Teaching Assistant
- Lihua Tang, Office hours M 11:30 am - 12:30 pm, 9 Patterson Hall,
ltang@ncsu.edu
(
Return to top)
Class notes

Class notes (in pdf format) will be available here in
March 2007; if you are taking this class in Spring 2007, you will need
to purchase the notes at the
NCSU Bookstores.
(
Return to top )
Homework assignments and tentative due dates
- Homework
1, due
Tuesday, February 6, and data set
for problem 6 and
data set for
problem 7
- Homework 2, due
Thursday, February 15, and data set
for problem 2
- Homework 3, due
Tuesday, March 13, and data set
for problem 3 and data set
for problem 4
- Homework 4, due
Tuesday, March 27, and data set for
problem 3
- Homework
5, due Tuesday, April 10, data set for problem 1, and data set for problems 2 and 3.
- Homework
6, due Thursday, April 26, data set for problem 1, data set for problem 2, data set for problem 3, data set for problem 4,
(
Return to top )
Homework solutions

- Homework 1
Solutions, program for
problems 5 and 6, output for problems 5
and 6, program for problem 7, output for problem 7, R program to create scatterplots
for each insulin group, and scatterplot for
group 1, scatterplot for
group 2, and scatterplot for
group 3
- Homework 2
Solutions, program for problem
2, output for problem 2, R program to create spaghetti plots, spaghetti plots, plot of group sample mean vectors, R program to create scatterplots,
scatterplot for
group 1, scatterplot for
group 2, scatterplot for
group 3, and scatterplot for
group 4
- Homework 3
Solutions, program for problem
3, output for problem 3, and
program for problem
4, output for problem 3
- Homework 4
Solutions, program for problem
3, and output for problem 3
- Homework 5
Solutions, program for problem
1, output for problem 1,
program for problem
2, output for problem 2, and
program for problem
3, output for problem 3.
- Homework 6
Solutions, program for problem
1, output for problem 1,
program for problem
2, output for problem 2,
program for problem
3, output for problem 3, and
program for problem
4, output for problem 3.
(
Return to top )
Tests
- Test 1 took place
Tuesday, March 20 and covered Chapters 1 to 8 of the notes
(through Homework 3 -- Chapter 6 not included). Here are
the solutions, a
histogram of the grades (scores
are out of 100), and summary statistics: Mean = 84.5, Median=86, Standard
deviation = 12.6, n=27.
- Test 2 took place
Thursday, May 3 and covered everything since Test 1. Here are
the solutions, a
histogram of the grades (scores
are out of 100), and summary statistics: Mean = 90.9,
Median = 92.5, Standard deviation = 7.2, n=26.
(
Return to top )
Data analysis project
(
Return to top )
SAS on-line documentation
(
Return to top )
SAS and R examples (in class notes)
- Plotting using R The R language is an open source
computing environment for statistics and graphics that is available
here at NCSU. (R is an open source version of Splus.) Although we use
SAS in this course to carry out analyses (which can also be done in R,
by the way), the instructor vastly prefers R for making graphics. So
all of the plots in the course notes and in homework solutions are
created with R. Here is a sample R
program that reads in the dental study data introduced in Chapter
1 of the notes and creates two plots (output to pdf files): two-panel spaghetti plot of the girl
and boy data separately with sample mean profiles superimposed,
and spaghetti plot of the girl and
boy data together using the gender indicator as the plotting
symbol, as in Figure 1 of Chapter 1.
- Dental data The dental study
data in Example 1 of Chapter 1, which we analyze repeatedly in later
chapters for illustration, are world-famous and used by many authors
discussing longitudinal data methods. Here is a picture of the pterygomaxillary
fissure.
- Chapter 4, Example 1
(computation of sample mean vectors, sample covariance and correlation
matrices, pooled covariance and correlation matrices, data for
scatterplot matrices, lag plots, and autocorrelation functions using
PROCs MEANS, CORR, DISCRIM, GLM, and MIXED for the dental data): program
, output , and data set. The program outputs the
centered/scaled distances in the file dentcenter.dat, which may be read
into the R program dentscatter.R
to obtain the scatter plots for girls and boys.
A SAS program shows how to
call PROC INSIGHT to make the scatter plots for girls and boys.
- Chapter 5, Example 1 (analysis of dental data
by univariate repeated measures analysis of variance, PROC GLM): program ,
output , and data set
- Chapter 5, Example 2 (analysis of guinea pig
diet data by univariate repeated measures analysis of variance, PROC
GLM): program , output , and data set
- Chapter 6, Example 1 (analysis of dental data
by multivariate repeated measures analysis of variance, PROC GLM): program ,
output , and data set
- Chapter 6, Example 2 (analysis of guinea pig
diet data by multivariate repeated measures analysis of variance, PROC
GLM): program , output , and data set
- Chapter 8, Example 1 (analysis of dental data
using general linear population averaged regression model, PROC MIXED): program ,
output , and data set
- Chapter 8, Example 2 (analysis of
ultrafiltration data using general linear population averaged
regression model, PROC MIXED): program
, output , and data set
- Chapter 8, Example 3 (analysis of
hip replacement data using general linear population averaged
regression model, PROC MIXED): program
, output , and data set
- Chapter 9, Example 1 (analysis of dental data
using a random coefficient model, PROC MIXED): program ,
output , and data set
- Chapter 9, Example 2 (analysis of
ultrafiltration data using a random coefficient model, PROC MIXED): program ,
output , and data set
- Chapter 10, Example 1 (analysis of dental data
using linear mixed effects model, PROC MIXED): program ,
output , and data set
- Chapter 10, Example 2 (analysis of
weight-lifting data using a linear mixed effects model, PROC MIXED):
program , output , and
data set
- Chapter 11, Example 1 (analysis of horsekick data
using a generalized linear model, PROC GENMOD): program ,
output , and data set
- Chapter 11, Example 2 (analysis of myocardial
infarction data using a generalized linear model, PROC GENMOD): program , output , and data set
- Chapter 11, Example 3 (analysis of clotting
time data using a generalized linear model, PROC GENMOD): program , output , and data set
- Chapter 12, Example 1 (analysis of epileptic
seizure data using a population-averaged model and GEE, PROC GENMOD): program , output , and data set
- Chapter 12, Example 2 (analysis of wheezing
data using a population-averaged model and GEE, PROC GENMOD): program , output , and data set
(
Return to top )
Errata list
The errata list will be updated as we find typos!
Announcements (most recent shown first)
- GRADED TEST 2 available in 220 Patterson Hall on Sasha's desk.
Grades should be available on-line Tuesday am, May 8.
- TEST 2 will be held in 208 PATTERSON HALL (NOT the classroom)
during the scheduled exam period on Thursday, May 3, 8:00 - 11:00 am.
Test 2 will cover Chapters 9, 10, 11, and 12. As with Test 1, you will
be allowed ONE 8.5 x 11 inch sheet of HANDWRITTEN notes, ONE SIDE OF
THE SHEET ONLY.
- The data set links for Homework 5 were previously INCORRECT and
have been changed (as of 4/5/07).
- Our TA, Lihua Tang, is taking her prelminary oral exam on Monday,
3/26. Hence, she will not hold office hours at her usual time this
week; instead, she will hold office hours on Thursday 3/29 from 12:00 noon
to 1:00 pm.
- There is a TYPO at the bottom of page 305 of the notes. The displayed
model statement should be:
model ufr = c1 c2 c3 tmp c1*tmp c2*tmp c3*tmp / solution;
- TEST 1 will take place TUESDAY, MARCH 20, in 208 PATTERSON HALL, from
6:00 - 8:00 pm. You are allowed ONE 8.5 x 11 inch sheet of HANDWRITTEN
notes, ONE SIDE OF THE SHEET ONLY.
The test will cover Chapters 1-8, except for Chapter 6, which we did not
cover.
- We WILL have class at the usual time on TUESDAY, MARCH 20. I will
have my usual office hours from 10 - 11 after class.
- Graded Homework 3 will be available after 9:30 am in 220 Patterson
Hall. The solutions are posted above.
- Our TA, Lihua Tang, will also be out of town next week, hence,
she will not be able to hold her regular office hours on Monday,
March 12, 11:30 - 12:30. She has asked me to let you know this and that
she would be happy to take questions by email.
- Because of the change in the test date, I have changed the due
date for Homework 3 to be Tuesday, March 13 (after spring break).
There is no class this day, but YOUR HOMEWORK MUST BE TURNED IN BY
NOON ON THE 13TH TO SASHA MIAO IN 220 PATTERSON HALL!!! No late
homework accepted for reasons other than dire emergencies.
- There will be NO CLASS on Tuesday, March 13 (Tuesday after spring
break).
- Date of Test 1. As we discussed in class on February 22,
the date of the test is now set as Tuesday, March 20 -- details
will be given in class.
- Typo in Homework 3, Problem 3. In Problem 3, the coding of
the treatment indicator is noted incorrectly. It should say (at the
bottom of page 4):
Treatment indicator (=1 if placebo, =2 if low dose, =3 if high dose)
- Typo in Homework 2, Problem 1(e). In Problem 1(e), is
should say "q=4 groups" (not "times").
- Typo in Homework 2, Problem 2(g). In Problem 2(g), it should
read "Returning to the issue in (d)..." (that is, it should refer to (d)
instead of (c)).
- Forgotten Homework 2: Hard copies of Homework 2 were
supposed to be handed out on Tuesday, February 6, but were not. These
will be handed out on Thursday, February 8. In the meantime, you may
download a copy above.
- SAS procedure for standardizing variables: There is a
procedure called PROC STDIZE that will automatically standardize
variables. See the updated version of the program for Example 1 in
Chapter 4 for demonstration of its use.
- Typos in Homework 1, problem 7(a): In this problem, time
is in units of "hours." In part (a)(ii), "day" should be "time (hours)."
In part (a)(iv), "month" should be "time (hours)."
- Homework 1: Several of you have asked about fitting
model (3) in problem 5(e) of the homework. At the bottom of page 3,
you are asked to write a SAS program that reads in the data and then
fits both of models (2) and (3). Here are some comments:
(i) When you read the data in from the file, given you are fitting a
regression model in the ACTUAL times tj , you
should not be transforming the times from the age scale (8,10,12,14)
to (1,2,3,4) as in the example code at the end of the chapter --
otherwise, you will not be fitting the correct models.
(ii) When you fit model (3), the problem says to "call proc glm as above,
but now use instead the model statement"
model distance = gender age gender*age / solution;
This means that you should KEEP the class statement you used to fit
model (2), and just replace the model statement for model (2) with
this one. If you also eliminate the class statement, you will not be
fitting model (3). The point here is that the way SAS procedures
parameterize models by default when a class statement is used to set
up indicator ("dummy") variables for a covariate that takes on
"categorical" values (like 0 and 1) may not be as you expect. (There
are ways to override the default; if you are interested, see the SAS
documentation for proc glm).
- Chapter 4, Example 1: There is an alternative and much
easier way to transform data that are stored in the format of one data
record (line) per observation to that of one data record (line) per
individual:
data dent1; infile 'dental.dat';
input obsno child age distance gender;
run;
proc transpose data=dent1 out=dent2 prefix=age;
by gender child notsorted;
var distance;
run;
Thanks to Laine Elliott for sending code to show that proc transpose can
do this! The Example 1, Chapter 4 programs have been updated to include
this code.
Chapter 4, Example 1: Other ways to get the pooled sample
covariance and pooled sample correlation matrices: I deliberately lied
to you yesterday when I said that the only way to get these matrices
in SAS is using proc discrim. It is also possible to get them using
proc mixed (which we will introduce later in the course) and (almost)
using proc glm with the manova option. The program for Example 1,
Chapter 4 has been updated to add code to show how this may be
accomplished.
- The due date for Homework 1 has been CHANGED to Tuesday, February 6.
This change is reflected above.
(
Return to top )