/****************************************************************** CHAPTER 12, EXAMPLE 2 Fit a logistic regression model to the "wheezing" data. These are binary data, thus, we use the Bernoulli (bin) mean/variance assumptions. The model is fitted with different working correlation matrices. ******************************************************************/ options ls=80 ps=59 nodate; run; /****************************************************************** The data look like (first 4 records): 1 portage 9 0 1 10 0 1 11 0 1 12 0 0 2 kingston 9 1 1 10 2 1 11 2 0 12 2 0 3 kingston 9 0 1 10 0 0 11 1 0 12 1 0 4 portage 9 0 0 10 0 1 11 0 1 12 1 0 ... column 1 child column 2 city columns 3-5 age=9, smoking indiciator, wheezing response columns 6-8 age=10, smoking indiciator, wheezing response columns 9-11 age=11, smoking indiciator, wheezing response columns 12-14 age=12, smoking indiciator, wheezing response Some of the children have missing values for smoking and wheezing, as shown in Chapter 1. There are 32 children all together. See the output for the full data printed out one observation per line. We read in the data using the "@@" symbol so that SAS will continue to read for data on the same line and the OUTPUT statement to write each block of three observations for each age in as a separate data record. The resulting data set is one with a separate line for each observation. City is a character variable, so the dollar sign is used to read it in as such. ******************************************************************/ data wheeze; infile "wheeze.dat.txt"; input child city $ @@; do i=1 to 4; input age smoke wheeze @@; output; end; run; proc print data=wheeze; run; /*********************************** The missing values are coded in the usual SAS way by periods (.). We delete these from the full data set, so that the data set input to PROC GENMOD contains only the observed data. We assume that the fact that these observations are missing has nothing to do with the thing under study (which may or may not be true). ***********************************/ /* define a new variable TIME */ data wheeze; set wheeze; if wheeze=. then delete; time=age; age2=age-11; run; proc print data=wheeze; run; /***************************************************************** The MODELSE option specifies that the standard error estimates printed for the elements of betahat are based on the usual theory. By default, the ones based on the "robust" version of the sampling covariance matrix are printed, too. Thus, because these data are not balanced, we use the WITHIN option of the REPEATED statement to give SAS the time variable AGE as a classification variable so that it can figure out where the missing values are and use this information in estimating the correlation matrix. Note: The variable defining the within-subject effect must be listed on the CLASS statement. ******************************************************************/ /**************************************************************** Assume marginal probability of infection follows the following logistic model: 1. logit(mu_ij) = beta0 + beta1*age2ij + + beta2*city + beta3*smoke1_ij+ + beta4*smoke1_ij 2. var(mu_ij) = mu_ij (1-mu_ij) PHI set to 1 3. Model the correlation between pairs (Y_ij, Y_ik): LOGOR or PEARSON CORRELATION ******************************************************************/ proc means data=wheeze; class age smoke; run; title "EXCHANGEABLE STRUCTURE FOR THE ODDS RATIO"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = age2 city smoke / dist=bin link=logit; repeated subject=child / corrw covb modelse within=time logor=EXCH; /* Model association using odds ratio*/ /* You should specify either the LOGOR= or the TYPE= option, but not both. */ run; /****************************************************************** XCHANGEABLE STRUCTURE FOR THE ODDS RATIO 9 The GENMOD Procedure Model Information Data Set WORK.WHEEZE Distribution Binomial Link Function Logit Dependent Variable wheeze Number of Observations Read 100 Number of Observations Used 100 Number of Events 29 Number of Trials 100 Class Level Information Class Levels Values child 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 city 2 kingston portage smoke 3 0 1 2 time 4 9 10 11 12 Response Profile Ordered Total Value wheeze Frequency 1 1 29 2 0 71 PROC GENMOD is modeling the probability that wheeze='1'. Parameter Information Parameter Effect city smoke Prm1 Intercept Prm2 age2 Prm3 city kingston Prm4 city portage Prm5 smoke 0 Prm6 smoke 1 Prm7 smoke 2 EXCHANGEABLE STRUCTURE FOR THE ODDS RATIO 10 The GENMOD Procedure GEE Model Information Log Odds Ratio Structure Exchangeable Within-Subject Effect time (4 levels) Subject Effect child (32 levels) Number of Clusters 32 Correlation Matrix Dimension 4 Maximum Cluster Size 4 Minimum Cluster Size 1 Covariance Matrix (Model-Based) Prm1 Prm2 Prm3 Prm5 Prm6 Prm1 0.3208452 0.017826 -0.116126 -0.243242 -0.229867 Prm2 0.017826 0.036441 -0.005956 0.012697 0.0061977 Prm3 -0.116126 -0.005956 0.2651187 -0.026425 -0.021605 Prm5 -0.243242 0.012697 -0.026425 0.4189291 0.2564586 Prm6 -0.229867 0.0061977 -0.021605 0.2564586 0.3571251 Covariance Matrix (Empirical) Prm1 Prm2 Prm3 Prm5 Prm6 Alpha1 Prm1 0.2416633 0.0391736 -0.077757 -0.171266 -0.269155 0.010573 Prm2 0.0391736 0.0435805 0.0225841 -0.023047 -0.034064 0.0145666 Prm3 -0.077757 0.0225841 0.2460886 -0.025183 -0.007881 -0.011053 Prm5 -0.171266 -0.023047 -0.025183 0.3294239 0.2956445 -0.033304 Prm6 -0.269155 -0.034064 -0.007881 0.2956445 0.4711632 -0.031452 Alpha1 0.010573 0.0145666 -0.011053 -0.033304 -0.031452 0.1657002 Algorithm converged. GEE Fit Criteria QIC 128.3889 QICu 127.2035 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5653 0.4916 -1.5288 0.3982 -1.15 0.2502 age2 -0.1844 0.2088 -0.5936 0.2247 -0.88 0.3770 city kingston 0.2721 0.4961 -0.7002 1.2444 0.55 0.5833 city portage 0.0000 0.0000 0.0000 0.0000 . . smoke 0 -0.4523 0.5740 -1.5772 0.6727 -0.79 0.4307 smoke 1 -0.8667 0.6864 -2.2121 0.4786 -1.26 0.2067 EXCHANGEABLE STRUCTURE FOR THE ODDS RATIO 11 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5653 0.4916 -1.5288 0.3982 -1.15 0.2502 age2 -0.1844 0.2088 -0.5936 0.2247 -0.88 0.3770 city kingston 0.2721 0.4961 -0.7002 1.2444 0.55 0.5833 city portage 0.0000 0.0000 0.0000 0.0000 . . smoke 0 -0.4523 0.5740 -1.5772 0.6727 -0.79 0.4307 smoke 1 -0.8667 0.6864 -2.2121 0.4786 -1.26 0.2067 Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5653 0.5664 -1.6755 0.5449 -1.00 0.3183 age2 -0.1844 0.1909 -0.5586 0.1897 -0.97 0.3340 city kingston 0.2721 0.5149 -0.7371 1.2813 0.53 0.5972 city portage 0.0000 0.0000 0.0000 0.0000 . . smoke 0 -0.4523 0.6472 -1.7208 0.8163 -0.70 0.4847 smoke 1 -0.8667 0.5976 -2.0380 0.3045 -1.45 0.1470 smoke 2 0.0000 0.0000 0.0000 0.0000 . . Alpha1 0.6242 . . . . . Scale 1.0000 . . . . . NOTE: The scale parameter was held fixed. ******************************************************************/ title "COMPOUND SYMMETRY (EXCHANGEABLE) CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = age2 city smoke / dist=bin link=logit; repeated subject=child / type=cs corrw covb modelse within=time; run; /****************************************************************** COMPOUND SYMMETRY (EXCHANGEABLE) CORRELATION 15 The GENMOD Procedure Model Information Data Set WORK.WHEEZE Distribution Binomial Link Function Logit Dependent Variable wheeze Number of Observations Read 100 Number of Observations Used 100 Number of Events 29 Number of Trials 100 Class Level Information Class Levels Values child 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 city 2 kingston portage smoke 3 0 1 2 time 4 9 10 11 12 Response Profile Ordered Total Value wheeze Frequency 1 1 29 2 0 71 PROC GENMOD is modeling the probability that wheeze='1'. Parameter Information Parameter Effect city smoke Prm1 Intercept Prm2 age2 Prm3 city kingston Prm4 city portage Prm5 smoke 0 Prm6 smoke 1 Prm7 smoke 2 Algorithm converged. Covariance Matrix (Model-Based) Prm1 Prm2 Prm3 Prm5 Prm6 Prm1 0.31920 0.01848 -0.11749 -0.23984 -0.22600 Prm2 0.01848 0.03628 -0.006102 0.01247 0.005754 Prm3 -0.11749 -0.006102 0.26628 -0.02662 -0.02205 Prm5 -0.23984 0.01247 -0.02662 0.41474 0.25313 Prm6 -0.22600 0.005754 -0.02205 0.25313 0.35456 Covariance Matrix (Empirical) Prm1 Prm2 Prm3 Prm5 Prm6 Prm1 0.23643 0.03855 -0.07726 -0.16641 -0.26243 Prm2 0.03855 0.04312 0.02250 -0.02237 -0.03332 Prm3 -0.07726 0.02250 0.24505 -0.02561 -0.009006 Prm5 -0.16641 -0.02237 -0.02561 0.32559 0.29010 Prm6 -0.26243 -0.03332 -0.009006 0.29010 0.46383 Algorithm converged. Working Correlation Matrix Col1 Col2 Col3 Col4 Row1 1.0000 0.1331 0.1331 0.1331 Row2 0.1331 1.0000 0.1331 0.1331 Row3 0.1331 0.1331 1.0000 0.1331 Row4 0.1331 0.1331 0.1331 1.0000 Exchangeable Working Correlation Correlation 0.1330533182 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5522 0.4862 -1.5052 0.4008 -1.14 0.2561 age2 -0.1874 0.2076 -0.5944 0.2196 -0.90 0.3667 city kingston 0.2553 0.4950 -0.7149 1.2255 0.52 0.6061 city portage 0.0000 0.0000 0.0000 0.0000 . . smoke 0 -0.4577 0.5706 -1.5761 0.6607 -0.80 0.4225 smoke 1 -0.8703 0.6811 -2.2052 0.4645 -1.28 0.2013 smoke 2 0.0000 0.0000 0.0000 0.0000 . . Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5522 0.5650 -1.6595 0.5552 -0.98 0.3284 age2 -0.1874 0.1905 -0.5608 0.1859 -0.98 0.3251 city kingston 0.2553 0.5160 -0.7561 1.2667 0.49 0.6208 city portage 0.0000 0.0000 0.0000 0.0000 . . smoke 0 -0.4577 0.6440 -1.7199 0.8045 -0.71 0.4773 smoke 1 -0.8703 0.5954 -2.0374 0.2967 -1.46 0.1438 smoke 2 0.0000 0.0000 0.0000 0.0000 . . Scale 1.0000 . . . . . NOTE: The scale parameter was held fixed. ******************************************************************/ title "UNSTRUCTURED CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = age2 city smoke / dist=bin link=logit; repeated subject=child / type=un corrw covb modelse within=time ; /* Model associations using correlation*/ run; title "AR(1) CORRELATION"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = city smoke / dist=bin link=logit; repeated subject=child / type=ar(1) corrw covb modelse within=time; run; title "UNSTRUCTURED CORRELATION, NO City COVARIATE"; proc genmod data=wheeze descending; class child city smoke time; model wheeze = age2 smoke / dist=bin link=logit; repeated subject=child / type=un corrw covb modelse within=time ; /* Model associations using correlation*/ run; /************************************************************** PROC NLMIXED **************************************************************/ data wheeze2; set wheeze; smoke1 = 0; smoke2 = 0; if smoke=0 then smoke1=1; if smoke=1 then smoke2=1; run; proc print data=wheeze2; run; /************************************************************** PROC NLMIXED fits non linear mixed effects models QPOINTS option specifies the number of quadrature points to be used during evaluation of integrals. For METHOD=GAUSS, equals the number of points used in each dimension of the random effects, resulting in a total of points, where is the number of dimensions. For METHOD=ISAMP, specifies the total number of quadrature points regardless of the dimension of the random effects. By default, the number of quadrature points is selected adaptively, and this option disables the adaptive search. The PARMS statement lists names of parameters and specifies initial values, possibly over a grid. You can specify the parameters and values directly in a list, or you can provide the name of a SAS data set that contains them by using the DATA= option. **************************************************************/ proc nlmixed data=wheeze2 qpoints=50; parms beta0=-0.55 beta1=-.17 beta3=.42 beta4= .44 s2u=2.0; eta=beta0 + beta1*age2 + beta3*smoke1 +beta4*smoke2 + u; expeta=exp(eta); p=expeta/(1+expeta); model wheeze ~ binary(p); random u ~ normal(0, s2u) subject=child; run; /************************************************************** UNSTRUCTURED CORRELATION 129 The NLMIXED Procedure Specifications Data Set WORK.WHEEZE2 Dependent Variable wheeze Distribution for Dependent Variable Binary Random Effects u Distribution for Random Effects Normal Subject Variable child Optimization Technique Dual Quasi-Newton Integration Method Adaptive Gaussian Quadrature Dimensions Observations Used 100 Observations Not Used 0 Total Observations 100 Subjects 32 Max Obs Per Subject 4 Parameters 5 Quadrature Points 50 Parameters beta0 beta1 beta3 beta4 s2u NegLogLike -0.77 -0.2 0.44 0.33 2 62.9045997 Iteration History Iter Calls NegLogLike Diff MaxGrad Slope 1 2 60.0500317 2.854568 5.195508 -66.3636 2 4 59.2108023 0.839229 1.596643 -17.991 3 5 58.5305753 0.680227 1.142298 -1.17902 4 7 58.04261 0.487965 0.353487 -2.08134 5 8 57.9924863 0.050124 0.343923 -0.13333 6 10 57.9670275 0.025459 0.059179 -0.05588 7 12 57.9659445 0.001083 0.019116 -0.00255 8 14 57.9659054 0.000039 0.004024 -0.00009 9 16 57.9659043 1.093E-6 0.001049 -2.82E-6 10 18 57.9659042 8.663E-8 0.000132 -1.93E-7 NOTE: GCONV convergence criterion satisfied. Fit Statistics -2 Log Likelihood 115.9 AIC (smaller is better) 125.9 AICC (smaller is better) 126.6 # The "Fit Statistics" table lists useful statistics based on the maximized value of the log likelihood Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > |t| Alpha Lower beta0 -0.5081 0.5952 31 -0.85 0.3998 0.05 -1.7221 beta1 -0.2082 0.2192 31 -0.95 0.3496 0.05 -0.6553 beta3 -0.5246 0.7392 31 -0.71 0.4832 0.05 -2.0321 beta4 -0.9737 0.6829 31 -1.43 0.1639 0.05 -2.3665 s2u 0.7323 0.8059 31 0.91 0.3705 0.05 -0.9113 Parameter Estimates Parameter Upper Gradient beta0 0.7058 -0.00007 beta1 0.2389 0.00005 beta3 0.9829 0.000064 beta4 0.4191 -0.00013 s2u 2.3759 0.000027 **************************************************************/ /************************************************************** Results of the analysis suggest: 1. Estimates of the fixed effects in the mixed effects logistic model are slightly larger in absolute value than in the marginal model 2. beta3 has interpretation in terms of the log odds of wheezing for a particular child. That is, the ratio of odds of wheezing for a given child whose mother smokes moderately, versus the same child (or a child with identical latent, underlying risk) whose mother smokes heavily, is 0.377 =exp(-0.9737 ). 3. Estimated variance of the random intercepts in relatively small (=.7323) This gives a sense of the heterogeneity of the subjects. **************************************************************/