FINAL EXAM ST 445 08 May 2012 ONE PAGE (ONE SIDE) OF NOTES NAME ____________________________ For most of the questions on this quiz, I am asking what the output will be from the SAS code. *** For each dataset created, be sure to indicate the number of variables and the number of observations. *** Note that the line numbers are given with the code, and remember that there's a blank column between the line numbers' field and any code or data. 1. a) How many observations, variables? 00001 data usaid ; 00002 input year foraid pres $ ; 00003 label foraid='foreign aid' pres='president' ; 00004 cards ; 00005 1961 4.24 Kennedy 00006 1962 4.53 Kennedy 00007 1963 5.06 Kennedy 00008 1964 4.92 Johnson 00009 1965 5.05 Johnson 00010 1966 5.05 Johnson 00011 1967 6.67 Johnson 00012 1968 6.74 Johnson 00013 ; 00014 run ; 00015 proc print data=usaid label ; 00016 id year ; 00017 title "US Foreign Aid in 1960's" ; 00018 run ; b) What would be the result of the PROC PRINT? c) Would the following statement give the same results? 00017 title 'US Foreign Aid in 1960's' ; d) What does the following statement do? 00016 id year ; 2. a) How many observations, variables in dataset FOOTBALL? 00001 data football ; 00002 input year winner $ loser $ ; * @8 and @22 ; 00003 datalines ; 00004 1960 Philadelphia Green Bay 00005 1961 Green Bay New York 00006 1962 Green Bay New York 00007 1963 Chicago New York 00008 1964 Cleveland Baltimore 00009 1965 Green Bay Cleveland 00010 ; 00011 data both ; 00012 merge usaid football ; 00013 by year ; 00014 run ; 00015 proc print data=both label ; 00016 var year pres winner ; 00017 title2 "and NFL Champions" ; 00018 run ; b) How many observations, variables in dataset BOTH? c) What would be the result of the PROC PRINT? d) The dataset FOOTBALL is a disaster. What went wrong? e) Correct the code that the value of LOSER in 1963 is 'New York' 2. a) How many observations, variables in dataset BASEBALL? 00001 data baseball ; 00002 input year winner $ loser $ ; * @8 and @20 ; 00003 datalines ; 00004 1960 Pittsburgh New York 00005 1961 New York Cincinnati 00006 1962 New York San Francisco 00007 1963 Los Angeles New York 00008 1964 St Louis New York 00009 1965 Los Angeles Minnesota 00010 ; 00011 data both ; 00012 merge usaid baseball ; 00013 by year ; 00014 run ; 00015 proc print data=both label ; 00016 var year pres winner ; 00017 title2 "and World Series Champions" ; 00018 run ; b) How many observations, variables in dataset BOTH? c) What would be the result of the PROC PRINT? d) The dataset BASEBALL is a disaster. What went wrong? e) Correct the code that the value of LOSER in 1963 is 'New York' 3. a) How many observations, variables? b) What is the output from this SAS program? 00001 data scores ; 00002 input t1 t2 gender $ group ; 00003 cards ; 00004 13 17 M 2 00005 10 18 M 1 00006 14 19 M 1 00007 9 14 F 1 00008 15 18 F 2 00009 16 13 F 3 00010 8 16 F 3 00011 ; 00012 run ; 00013 proc sort data=scores ; 00014 by gender group ; 00015 run ; 00016 proc plot data=scores ; 00017 plot t2*t1=group ; 00018 title 'completion times' ; 00019 by gender ; 00020 run ; 4. a) How many observations, variables ? b) What is the output from this SAS program? 00001 data team ; 00002 set scores ; * from #3 ; 00003 retain hmany 0 ; 00004 member = 'member' ; 00005 if( hmany > 2 ) then member = 'alternate' ; 00006 if ( min(t1,t2) < 12 ) then delete ; 00007 hmany = hmany + 1 ; 00008 run ; 00009 proc print data=team ; 00010 var gender member ; 00011 title2 'team members' ; 00012 run ; c) Would the following statement give the same results? If not, what would be different? 00006 if ( min(t1,t2) ge 12 ) then output ; 3. a) How many observations, variables? b) What is the output from this SAS program? 00001 data scores ; 00002 input t1 t2 gender $ group ; 00003 cards ; 00004 13 17 M 2 00005 10 18 M 1 00006 14 19 M 1 00007 9 14 F 1 00008 15 18 F 2 00009 16 13 F 3 00010 8 16 F 3 00011 ; 00012 run ; 00013 proc sort data=scores ; 00014 by group gender ; 00015 run ; 00016 proc plot data=scores ; 00017 plot t2*t1=gender ; 00018 title 'completion times' ; 00019 by group ; 00020 run ; 4. a) How many observations, variables ? b) What is the output from this SAS program? 00001 data team ; 00002 set scores ; * from #3 ; 00003 retain hmany 0 ; 00004 member = 'member' ; 00005 if( hmany > 2 ) then member = 'alternate' ; 00006 if ( min(t1,t2) < 12 ) then delete ; 00007 hmany = hmany + 1 ; 00008 run ; 00009 proc print data=team ; 00010 var gender member ; 00011 title2 'team members' ; 00012 run ; c) Would the following statement give the same results? If not, what would be different? 00006 if ( min(t1,t2) ge 12 ) then output ; 5. Recall the sulfur dioxide emissions data that we looked at in class. Take a look at the attached sheet which has SAS code on the left and output on the right panel. (fin12a.all) a) How many observations, variables in dataset SO2? b) What do the options in the TABLES statement do? c) What would happen to the table if we omitted the FORMAT statement for either county or estemt? (choose only one) The last part of the output is the result of a PROC CHART. d) Describe what the chart would look like if we omitted the FORMAT statement. e) If we just removed all of the options from the hbar statement, what kind of chart would we get? f) Write an appropriate descriptive title for this chart. 6. In the file 'blinka.dat' are measurements of resin flow from pine trees. A portion of the file is given below. WRITE THE CODE to produce a dataset with BLOCK CLONE RESIN DIR ------- ------- ------ ----- 01 756 8.088 ns 01 756 8.257 ew 09 430 7.609 ew 09 443 10.246 ns 09 443 10.238 ew 09 308 8.762 ns 09 352 7.472 ns ... ... ... ... 09 487 7.439 ns 09 487 7.533 ew where the file 'blinka.dat' looks like Florida 2007 Spring blk cl vial ns ew 01 756 972 8.088 8.257 01 cc4 973 9.157 8.938 09 430 975 . 7.609 09 443 976 10.246 10.238 09 308 977 8.762 . 09 352 978 7.472 7.355 ... ... ... 02 cc4 974 7.723 7.257 09 371 979 7.737 8.407 09 487 980 7.439 7.533 Note that there are two observations from each record, except for missing observations (denoted by '.') and also the clone 'cc4' is excluded. 7. A student trying to keep track of her final exams wrote the following code (with a minor logical error): data a ; keep course week dow when ; * cccccccccccccccc ; input course $ @14 day date9. time hhmmss8. ; if( day < '06May2012'd ) then week='second' ; else week='first' ; dow = put(day,downame.) ; * what day of week ; if( time > 60*60*12 ) then when='afternoon' ; else when='morning' ; *678901234567890 just a ruler ; cards ; physics 02May2012 8:00:00 math 03May2012 13:00:00 stat 08May2012 8:00:00 botany 07May2012 8:00:00 ; run ; proc print data=a ; title 'my final exams' ; run ; a) Observations, variables in dataset FINALS? b) What is the output from PROC PRINT? c) Write a DROP statement equivalent to the KEEP statement. d) What is the significance of 60*60*12? (Aside from the value of 43200.) 8. Give the output from this program (and recall the importance of 01 January 1960 -- and, remember, it has nothing to do with Fidel Castro.) data a ; input x @@ ; y = ( x > '07JAN61'd ) ; datalines ; 61 1765 -1355 ; proc print data=a ; title 'all the way with NWAY' ; run ;