Sunday, March 14, 2010

Analyzing error in WR prediction methods

In our last post, we looked at some charts that told us how wide receivers perform based on the number of years that they have spent in the league. Today, we will use these data to come up with prediction methods, then measure and minimize the error with each prediction method. For the purpose of prediction, we will use 2007 and 2008 statistics to try to predict 2009 performance. The error is the difference between predicted and actual 2009 performance. We will be using the RMS error method outlined a few posts below for calculating error.

First of all, we limited ourselves to players that had at least 20 catches in each of 2007, 2008, and 2009. That gave us a list of 48 players to use in this analysis. A bigger data base going back more years might give us more players to analyze, but recent changes in rules, enforcement, and offensive philosophies may make that older data less relevant.

The most important thing to try to predict for a WR is catches. Turns out, this is also one of the most difficult. We first tried the very simple method of using the last two years stats in a weighted average to come up with a prediction for catches. It turns out that to create minimal error, we had to use 30% of 2007 catches and 70% of 2008 catches. In other words, if a player caught 50 passes last year and 40 passes the year before, the best prediction for next year would be 50 * 0.7 + 40 * 0.3 = 47 catches.

We then tried to see if we could use improvement per year of experience to predict how many catches a WR might make. There are two years worth of stats that can be used here. First, you can take 2008 stats and predict 2009 stats from his experience in 2009. Also, you can take 2007 stats, predict 2008 stats from his experience in 2008, then from that value predict 2009 stats from his experience in 2009. We used a weighted average of these two years stats to minimize error, and found that the minimal error was achieved when only 2008 stats were used and 2007 stats were completely ignored. This makes sense because the 2007 stats used the prediction model twice, which would have essentially squared any error associated with the model.

We tried combining these two methods (using stats only and using stats and improvement), and found that using stats only by itself gave minimal error. So the only time when using improvement by experience is helpful is for rookies that do not have 2 years' worth of stats to predict with.

Next, we looked at how to predict yards per catch. Using only a weighted average of 2007 and 2008 stats, we found that error was minimized with a weight of 0.8 on the stats from 2 years ago and 0.2 on the stats from last year. This is a rather surprising result, that shows that yards per catch is actually pretty volatile and hard to predict. If we took a weighted average of this result and the average yards per catch for a player at this experience level, we find that the error is minimized with a weight of 0.4 on previous years' stats, and 0.6 on players' experience level. This shows us that how long a player has been in the league has slightly more to do with yards per catch than previous years stats. It uses enough of previous years' stats to ensure that the speedsters will still have above-average yards per catch, but they will come down to the pack as they start to age.

Finally, we looked at predicting TDs per catch. Using a weighted average from 2007 and 2008, we find that error is minimized with a weight of 0.2 from 2 years ago and 0.8 from the last year. This tells us that TDs per catch are much more predictable than yards per catch, and players tend to follow any trend that they had established the previous year. Combining this with average players' stats per yer of experience, we find that a perfectly balanced weight of 0.5 for the individual player's stats and 0.5 for the average stats for the player's experience level give us minimal error. So age does play a role in determining how many TDs a player is going to score, but not as much as a role as it does for yards per catch.

Tuesday, March 9, 2010

WR Stats with Experience

Using some of the methods outlined in previous posts, we took a look at WR trends over the past 3 seasons. First, we calculated the yards per catch for receivers based upon experience:


So there is a little up-and-down over the first 10 years, but it is pretty subtle (an 8% drop between years 6 and 10). Again, we see an actual increase after year 10, which I attribute to only the better receivers remaining on teams after 10 years in the league.

If we look at TDs as a percentage of receptions, we get:

So just as with running backs, it appears that wide receivers are able to get into the end zone much more often as a percentage of touches. For the rest of the career, players tend slightly upwards. The spike in year 10 is just an anomaly.

Now, let's take a look at how wide receivers improve from year to year in terms of catches.

So here we see the development cycle of receivers. There is a 4-to-1 improvement from the rookie season to year 2, and a 1.7-to-1 improvement from year 2 to year 3. So a rookie that gets 10 catches could be expected to get 40 catches in year 2 and 68 in year 3. This pattern of improvement actually continues through year 7.

Next up: calculating the error.

Friday, March 5, 2010

Player Movement Tracker

Hello. As today is the first day of free agency, I will attempt to compile a list of all player movement that has a direct impact on fantasy football. I won't be dealing with any rumors, third-string defensive back movement, etc. Just the done deals that will affect a player's fantasy football stats.

I will start with what has happened this morning, but I will continue to update this post as new moves are made all the way up to training camp.

OFFENSIVE PLAYERS
Nate Burleson - WR - From Seattle to Detroit
Brandon Manumaleuna - TE - From San Diego to Chicago
Chester Taylor - RB - From Minnesota to Chicago
Anquon Boldin - WR - From Arizona to Baltimore
Kassim Osgood - WR - From San Diego to Jacksonville
David Carr - QB - From Giants to San Francisco
Arnez Battle - WR - From San Francisco to Pittsburgh
Marcus Mason - RB - From Washington to San Diego
Antwan Randle El - WR - From Washington to Pittsburgh
Reggie Brown - WR - From Philadelphia to Tampa Bay
Seneca Wallace - QB - From Seattle to Cleveland
Jim Sorgi - QB - From Indianapolis to Giants
Donte Stallworth - WR - From Prison to Baltimore
Thomas Jones - RB - From Jets to Kansas City
Antonio Bryant - WR - From Tampa Bay to Cincinnati
Jerheme Urban - WR - From Arizona to Kansas City
Ben Watson - TE - From New England to Cleveland
Hank Baskett - WR - From Indianapolis to Philadelphia
Larry Johnson - RB - From Cincinnati to Washington
Jake Delhomme - QB - From Carolina to Cleveland
Luke McCown - QB - From Tampa Bay to Jacksonville
Chris Baker - TE - From Jets to Seattle
Brady Quinn - QB - From Cleveland to Denver
LaDanian Tomlinson - RB - From San Diego to Jets
Peyton Hillis - RB - From Denver to Cleveland
Shaun Hill - QB - From San Francisco to Detriot
Ruvell Martin - WR - From St Louis to Seattle
Derek Anderson - QB - From Cleveland to Arizona
Rex Grossman - QB - From Houston to Washington
Charlie Whitehurst - QB - From San Diego to Seattle
Quinton Ganther - RB - From Washington to Seattle

DEFENSIVE IMPACT PLAYERS
Antonio Cromartie - CB - From San Diego to Jets
Corey Williams - DL - From Cleveland to Detroit
Julius Peppers - DL - From Carolina to Chicago
Duante Robinson - CB - From Houston to Atlanta
Karlos Dansby - LB - From Arizona to Miami
Antrel Rolle - S - From Arizona to Giants

Thursday, March 4, 2010

Predicting RB Touchdowns

Using the same methods detailed in the last few posts, I started looking at how running backs' touchdowns are affected by age and experience.

First, a look at td percentage (per carry) as a function of player age:
This curve looks OK, with a slight Fred Taylor effect, but an unexplainable notch appears at age 24.

Now for td percentage as a function of player experience:


This one makes perfect sense. Rookies and 2nd-year players do OK, then see a huge spike in year three. After that, it is more or less all downhill.

If we calculate the RMS error of simply using a player's last two years stats to predict next year's TDs, we get 4.56 TDs of error. Using age, the number falls to 3.66, and using experience, the error is 3.18 TDs.

Once again, it appears that using a combination of previous years' stats and player experience gives us the best prediction of future player performance. This is heartening, since that is the same thing we found when looking a RB yards per carry. Hopefully, we have come across a method that will work for all positions.

Next up: Wide receivers.

Monday, March 1, 2010

Sources of Error in RB YPC Projections

In the last few articles, I have shown some average yards per carry and improvement for running backs by age and by experience. In this article, we are going to examine the error we get if we apply these curves to predict 2009 statistics from 2008 and 2007 stats.

We will calculate error on a root-mean-squared (RMS) basis. In other words, we will take our projected 2009 yards per carry, subtract the actual 2009 yards per carry, square the result, add up the squared value for all the players, divide by the number of players, and, finally, take the square root of the result. This gives us a measure that is equally weighted for too high and too low errors, and in units of yards per carry.

As a baseline, we will figure out what the RMS error is if we use last years value or an average of the last two years.
Using last year's YPC = 0.7597
Using avg of last 2 year's YPC = 0.6650

So for our system we are trying to come up with, we better have an RMS error of less than 0.6650, or all this fancy cipherin' will be for nothing.

So we will start with the worst and work our way up to the best.

In last place, we have YPC improvement using the player's age, with a projection error of 0.8836. Funny, since I put in yesterday's post that this was my favorite based upon the curve shape, but it seems to not do very well.

Next, is YPC improvement using the player's experience, with an error of 0.8443. So it looks like multiplying last year's stats by a player improvement average is actually less accurate than just using the previous years' stats.

Using average YPC with age results in an error of 0.6206, so we finally have something that is better than just using the player's stats from the previous year.

Finally, using average YPC with experience results in an error of 0.6166. So just using a player's experience as opposed to any prior performance is the way to go? That just didn't seem right. So I decided to use a percentage of prior performance and a percentage of experience for projecting. I ran an analysis to find which split of percentages would minimize error, and found that a split of 20% prior statistics and 80% YPC per experience resulted in a minimal error of 0.5266.

Still, putting so much more weight on experience than prior performance does not feel right. For this analysis, I used only players with 100 or more carries in each of the 3 year of the analysis, so there were only 20 players used. This small sample size could be cause for error, so for 2010 stats, I will use 50% of the player's last 3 years average, and 50% of the average YPC per experience of the player.

When I'm able to get more years of stats in my database, and when I have time (which may not be until next offseason), I will revisit these numbers to see how the formula should be tweaked.

Next up: running back TDs.