ST 590G -- Computation for Data Analysis Fourth Assignment -- due Tuesday, 27 November 2012 Every year, college football fans debate which team is the best. As fruitless as this discussion may be, it does suggest an interesting problem for statistical analysis. One simple statistical model needs only the home and away teams, the two team's scores. (Adding overtime information is optional. Your task is to get the data ready for the analysis of one conference for one year. Although there are several sites that have college football data, I am recommending the Yahoo! site described further below. As I see it, there are three main tasks: 1) For each week, construct a dataset of results week of home team, away team, and the scores for each. Then put those datasets together. Delete games with nonconference opponents. 2) Construct a dataset with explanatory variables X1 up to Xk where there are k teams in the conference. For a given game between home team h and away team a, Xh is 1 and Xa is -1, and all of the other Xj are 0 (these teams are not involved). Also construct a response variable Y whose value is the difference between the scores: home - away. (We will do something similar in a class exercise.) 3) Do the statistical analysis by running the following code: proc glm data=whatever ; model y = x1-xk ; run ; and report the ranking of the teams during the regular season. (There's no need to analyze the post-season games.) Choose a year (2009, 2010, 2011), and a conference (e.g. Atlantic Coast (acc), Big Ten (big10), Southeastern (sec), etc.) for analysis. For year 2009, week 9, and the ACC, the site is http://rivals.yahoo.com/ncaa/football/scoreboard?&w=9&y=2009&c=acc For other weeks, change &w=9 to something else; other years, change &y=2009 to something else; other conferences, change &c=acc. For (1), I can see two routes: a) Read the text from the website and strip out the unnecessary html code. Use the patterns in the remaining text to determine the team names and scores. b) From the source html code, find the patterns of how the 4 pieces of information that we need are displayed. Then construct flags to find the team names and final scores.