Sunday, March 14, 2010

Analyzing error in WR prediction methods

In our last post, we looked at some charts that told us how wide receivers perform based on the number of years that they have spent in the league. Today, we will use these data to come up with prediction methods, then measure and minimize the error with each prediction method. For the purpose of prediction, we will use 2007 and 2008 statistics to try to predict 2009 performance. The error is the difference between predicted and actual 2009 performance. We will be using the RMS error method outlined a few posts below for calculating error.

First of all, we limited ourselves to players that had at least 20 catches in each of 2007, 2008, and 2009. That gave us a list of 48 players to use in this analysis. A bigger data base going back more years might give us more players to analyze, but recent changes in rules, enforcement, and offensive philosophies may make that older data less relevant.

The most important thing to try to predict for a WR is catches. Turns out, this is also one of the most difficult. We first tried the very simple method of using the last two years stats in a weighted average to come up with a prediction for catches. It turns out that to create minimal error, we had to use 30% of 2007 catches and 70% of 2008 catches. In other words, if a player caught 50 passes last year and 40 passes the year before, the best prediction for next year would be 50 * 0.7 + 40 * 0.3 = 47 catches.

We then tried to see if we could use improvement per year of experience to predict how many catches a WR might make. There are two years worth of stats that can be used here. First, you can take 2008 stats and predict 2009 stats from his experience in 2009. Also, you can take 2007 stats, predict 2008 stats from his experience in 2008, then from that value predict 2009 stats from his experience in 2009. We used a weighted average of these two years stats to minimize error, and found that the minimal error was achieved when only 2008 stats were used and 2007 stats were completely ignored. This makes sense because the 2007 stats used the prediction model twice, which would have essentially squared any error associated with the model.

We tried combining these two methods (using stats only and using stats and improvement), and found that using stats only by itself gave minimal error. So the only time when using improvement by experience is helpful is for rookies that do not have 2 years' worth of stats to predict with.

Next, we looked at how to predict yards per catch. Using only a weighted average of 2007 and 2008 stats, we found that error was minimized with a weight of 0.8 on the stats from 2 years ago and 0.2 on the stats from last year. This is a rather surprising result, that shows that yards per catch is actually pretty volatile and hard to predict. If we took a weighted average of this result and the average yards per catch for a player at this experience level, we find that the error is minimized with a weight of 0.4 on previous years' stats, and 0.6 on players' experience level. This shows us that how long a player has been in the league has slightly more to do with yards per catch than previous years stats. It uses enough of previous years' stats to ensure that the speedsters will still have above-average yards per catch, but they will come down to the pack as they start to age.

Finally, we looked at predicting TDs per catch. Using a weighted average from 2007 and 2008, we find that error is minimized with a weight of 0.2 from 2 years ago and 0.8 from the last year. This tells us that TDs per catch are much more predictable than yards per catch, and players tend to follow any trend that they had established the previous year. Combining this with average players' stats per yer of experience, we find that a perfectly balanced weight of 0.5 for the individual player's stats and 0.5 for the average stats for the player's experience level give us minimal error. So age does play a role in determining how many TDs a player is going to score, but not as much as a role as it does for yards per catch.

0 comments: