Mike's Results

Tuesday, April 7, 2009

The Predictive Power of AP Poll Rankings

In my previous post, I showed that based on the AP Poll rankings for college football and basketball, upsets are just as likely at the beginning of the season as at the end. This means that the polls don't increase in accuracy over the season.
Here I show a few other graphs on topics that college sports fans might find interesting. These topics are:
1) The overall average winning percentage for each rank in the AP Poll
2) How often the higher-ranked team wins for a given difference in ranking
3) Whether the top ten rankings are more accurate than lower rankings
4) Whether the AP Poll rankings have changed in accuracy over the last fifty years or so
If the graphs are too small to see, click on them to see them in full-size.
Winning Percentages by Rank

This graph shows how successful ranked teams are against all opponents (including unranked opponents and higher ranked opponents). The numbers are calculated over the same years of data as in the previous post (roughly 50 seasons for basketball and 70 for football).
The team ranked number one wins 83% of its games in football and 87% of its games in basketball. This winning percentage falls steadily to 63% and 66% for the teams ranked 25.
It's interesting how parallel the curves are for the two sports. This graph shows that the rankings are strong predictors of team success in subsequent games. Given how much attention is paid to rankings, it would be surprising if that were not true.

Winning Percentage by Difference in Ranking

This graph (by request) shows how often the higher ranked team wins as a function of the difference in rank between the opponents. For example, when teams ranked one apart (meaning number one playing number two, or number 16 playing number 17) play each other, the higher ranked team wins 49% of the time in basketball and 51% of the time in football. When the difference in rankings is ten (for example, number 10 playing number twenty or number one playing number 11), the higher ranked team wins 61% of the time in basketball and 65% of the time in football. These results seem sensible. It appears that a rank difference of one is almost meaningless, but that the predictive power of the difference in ranking rises fairly quickly after that.
The graph is noticeably more jagged when the rank difference is higher. This is because there are many fewer games where, for instance, teams ranked twenty spots apart play each other than teams ranked one spot apart. Fewer data points to average over means more random variation.

Are the Top Ten Rankings More Accurate?

This shows that the top ten rankings are more accurate than the next fifteen. This graph repeats the analysis in the previous graph, but segregates the data into games between teams ranked in the top ten and games between teams ranked 11-25. We can see that for a given difference in ranking, the higher ranked team is more likely to win if both are the top ten than if both are in the next fifteen. We can see this because the line in the graph for top ten is consistently above the line for 11-25, for each sport.
This result isn't too surprising, although there several reasons why it might be true. It could be that differences in talent are actually larger among the very best teams. Or perhaps it's just that the people who do the rankings have limited time and effort, and pay more attention to ranking the very best teams accurately.
One other point about this graph is that the difference in rank appears to be quite a bad predictor for teams ranked 11-25 when the difference in rank is greater than ten. There are very few actual games underlying those data points, so we probably shouldn't draw strong conclusions from them.
Aside for the extra curious: in a related regression analysis, I find that for a given rank difference, both teams being ranked one spot lower increases the chance of an upset by 0.6% with a standard error of 0.2%

Are Rankings More Accurate Now than in the Past?

These graphs (similar to the graphs in the last post) show that the rankings aren't any more accurate now than they were, for example, fifty years ago. The graphs show the fraction of games that ended in upsets for each season back to 1956 in basketball and 1936 in football. If the AP Poll rankings became more accurate over time, then we would expect to see the graph slope downward over time. The graphs are actually quite flat.
This is surprising. Given a half-century of sports analysis, and significantly better information technology in the form of cable tv and computers, I expected that the sports writers doing the predicting would do better in recent years than in earlier years.
Note that this graph only includes games between ranked teams, but that the conclusion doesn't change if we include games against unranked opponents. Also, as in the previous post, I have shown separate graphs in which I only include games in which teams in the top ten play each other. These graphs have more random variation, but the basic conclusion is the same.
More details about the data, including my sources, are available in the previous post.

Monday, March 9, 2009

Are college sports polls more accurate later in the season?

The question I'm interested in here is whether the AP Poll rankings in college football and basketball are more accurate later in the season when pollsters have seen more games and have more information about teams' abilities. To measure this I look at whether upsets--a lower ranked team beating a higher ranked team--occur more often early in the season. This is what we'd expect to see if the rankings get more accurate as the season progresses. In the graphs below I show the percentage of games between ranked teams that end in upsets, graphed against the week of the season. These numbers are averages from college football seasons 1936-2007 and college basketball seasons 1956-2008. Since the number of teams that are ranked has changed over time and since some people believe that the top 10 rankings are much more accurate, I've also shown graphs that only include games between teams ranked in the top 10.

The main point is that all four graphs show no decrease over the course of the season. The overall average is close to 40% in all four graphs. Although the numbers have random variation, the overall trend is flat for each. On average, upsets take place just as frequently in week 15, for example, as in week one.
This is interesting because it means that the AP Poll journalists are just as good at ranking teams when they haven't yet seen them play as when they've seen them play for a whole season. Of course, they've seen the teams or the members of the team play in previous seasons, which probably accounts for this. Before putting these numbers together my guess was that these graphs would show a decrease in upsets over the season.

Notes:
My data sources are appollarchive.com, Prof. John Trono's NCAA Basketball Archive, James Howell's College Football Scores, and NCAA.org.
Only games where both teams are ranked are included. When teams play on the day a poll is released, their rankings are based on the new poll.
I include preseason polls, Bowl games and NCAA tournament games in the results. For football, all Bowl games are assigned a week number one greater than the final week of the regular season. For basketball, all NCAA Tournament games are assigned a week number one greater than the final week of the regular season. In both cases this is because there are no new polls after the end of the regular season and before any of the Bowls or Tournament games.
The basketball results above are missing seasons 1963-1967 , 1978 and 1979.
In years when there is no preseason poll, I exclude all games that take place before the first poll, but I start counting the number of weeks when the games start.

Tuesday, March 3, 2009

Caffeine's Effects on Endurance and Sprinting in Rowing

Caffeine is one of the most widely used drugs and has been shown in many studies to enhance athletic performance--especially for endurance athletes (for example here (pdf), here and here; this one is specifically about rowing performance).

In 2004, the World Anti-Doping Agency and US Anti-Doping Agency removed caffeine from the list of banned substances (see here for rowing). Given how well-known the performance enhancing effects of caffeine are, it's very likely that many high-level athletes take caffeine prior to competing.
In order to find out what the effects of caffeine are on my own endurance performance, I performed an experiment in which I did 32 tests on the Concept 2 rowing machine over the course of 8 months. For half of the tests I took caffeine pills one hour before starting and for the other half I took a placebo (vitamin C pills). I drew the pills randomly and blinded myself to which pills I was taking so that at no point during the experiment did I know what pills I had taken for any test. I did the tests with roughly one week between each one, during which time I exercised moderately for about an hour a day. The dose of caffeine was two 200 mg pills (400 mg), which is roughly equivalent to 4-6 cups of coffee (a smaller dose than is often used in experimental studies and one that would have been unlikely to produce a violation when caffeine was a banned substance).
Each test had two parts. First, with no warm-up, I did one hour on the rowing machine at maximum effort at 20 strokes per minute. Second, five minutes after the end of the hour, I did a 300m (roughly 48 second) sprint at open stroke rate (the rating was usually 45-48 strokes per minute), again at maximum effort.
[Aside for those unfamiliar with the Concept 2 rowing machine: just like a treadmill or exercise bike, the rowing machine measures power output as you simulate the motion of an exercise. Two common measures of power output that rowers use are watts and speed. Speed is often quoted in terms of split times per 500m or total meters for a given time. I use both measures of speed, as well as watts, below. The rowing machine also tells you how many strokes you take per minute (the "rating"), which I kept constant during the 60 minute tests. Finally, unlike a treadmill, and more like an exercise bike, the rowing machine does not limit your speed. You can go as hard or as easy as you want at any time, with the machine's resistance increasing as you increase your effort. As someone who is also a competitive runner, I can say that a 60 minute test on the rowing machine is qualitatively very similar to a 60 minute time trial running on the road.]
The results of the tests, charted over time, are shown below in meters for the hour test and in watts for the 300m test. Black symbols represent the caffeine tests and red symbols represent the placebo (non-caffeine) tests.

The caffeine had a very strong effect on my performance on the one hour test. On average, I went 177 meters further in an hour when I had caffeine (equal to 1.1 split seconds/500m or 9 average watts) than when I had the placebo. This is roughly a 3% difference in average watts between the caffeinated and non-caffeinated tests. The graph shows that this effect was fairly consistent, even when my overall power varied over the months of the experiment The average of my caffeinated 60 minute tests was better than my best non-caffeinated 60 minute test. In fact, my best non-caffeinated 60 minute test was slower than my 11th-best caffeinated test.
The caffeine had a relatively smaller effect on the 300m sprint. My average power was about 1.7% percent higher for the caffeinated sprints than the non-caffeinated ones. As the graph shows, the effect of caffeine on the 300m test is harder to distinguish from random variation. I don't know whether the caffeine would have a different effect on a sprint done without the hour test beforehand.

The results I found for the one hour test are similar in magnitude of effect on split times to the results of the study of caffeine use on rowing performance referenced above. In that test they find that taking caffeine before a 2000m test leads to roughly a 1% improvement in speed, which means about 1 split second/500m.
Two major limitations of my study are that the results are only for one person and that the tests I chose are not the standard 2k or 6k tests that are most relevant to competitive rowers. Both limitations are partially addressed by the other study of caffeine on rowing performance that I cite above. Also, if my schedule permits, I intend to do a similar study for 6k tests on the rowing machine this fall. My main limitation is in finding the time and motivation to do enough tests to get a meaningful result. Doing 32 one hour tests over the course of eight months was mentally difficult. For that reason, if I do the 6k study, it will have many fewer data points. I will leave studies of the 2k test to other experimenters.
My perception is that the use of caffeine among American rowers is not pervasive--even among those who compete internationally. I find this surprising given that it is well within the rules and that other countries' athletes may take every advantage possible (see short discussions of this here, here and here). Even when caffeine was "banned," the allowable limit was much higher than the level seen to produce performance effects. I'm especially surprised that I know of no coaches who address whether athletes should use caffeine during selection or during competition.
The issue is similar with pseudoephedrine, another drug that was taken off the banned list in 2004. It is less clear whether pseudoephedrine is useful as a performance aid, though (for example see here, here, here and here).

A few notes:
I am a retired competitive sculler. I retired from competition about three weeks before beginning this study in July 2008. I did most of my competing at the national level, with a few appearances in international competitions, including the World Championships. I am 6'3" tall and during this experiment my weight was consistently close to 90 kg or 200 lbs.
During the experiment, I was often (but not always) able to feel whether I had taken caffeine or not, even though I wouldn't actually get confirmation until the whole experiment ended. I noticed no tendency for the caffeine to dehydrate me, but it did seem to interfere with my sleep sometimes.
The caffeine pills and vitamin C pills were the same size, shape and texture. They did have a different taste, however. In order to keep this from affecting the results I would put the pills under my tongue and swallow them quickly with a glass of water. This worked quite well to keep me from tasting them.
For the 60 minute test I didn't sprint at the end. Instead, whatever speed I was holding with around eight to 10 minutes left, I would maintain until the end. This allowed the 300m test to be the real measure of whether caffeine affects sprinting ability.
One of the studies I cite above finds that the effect of caffeine is smaller in those who use it frequently. I rarely have caffeine, so my results probably better represent what a non-caffeine user could expect. Also some studies have found that getting caffeine via coffee is unreliable and may not produce the same performance effects. For that reason, I recommend caffeine pills to those who want to test the effect on their own performance.

For those who are interested, using a standard t-test, the p-value for the hypothesis that the caffeine had no effect on the 60 minute test is less than 0.1%, and for the 300m test is 5.9%.