One day I might write about the NFL without a featured image of cheerleaders. But not today.
The other day I was reading on ESPN their list of 3 SuperBowl sleepers. Not the sort of article to be taken terribly seriously, but it’s something interesting to read in these sporting dark days of summer. Within that article it said :
A recent study by a Harvard student listed Miami as the favorite to win the AFC East.
Which linked to a article called the A Way-Too-Early Prediction of the NFL Season on the The Harvard College Sports Analysis Collective (HSAC) website.
Even without Tom Brady for 4 games I’d still have the Patriots as winning the AFC East, and the betting market does too, so I was intrigued to read what the Harvard Sports article had to say. If you can’t be bothered to click the link above, the summary is that they’re using a previous year’s Approximate Value (AP) stats from the Pro Football Reference site to predict a team’s ELO rating for this season, and then use that to calculate the win probability of each team in each game this season. I’ve shown their results below compared to the current best odds to qualify for the playoffs:
Ok there’s obviously a few issues here that don’t pass the highly-scientific laugh-test:
- Kansas City more likely to make the playoffs over Denver?
- New York Jets more likely to make the playoffs than not?
- What the hell happened to Dallas?
- What in seven hells has happened to the Ravens? Is my Grandmother their new starting QB?!
Another thing, there’s a total of 12 teams that will make the playoffs. So the % to make the playoffs from the Harvard model should total 1200%, which it does. Good. Now, the AFC teams and the NFC teams should total 600% each. But the AFC teams have a total % of 640%, while the NFC only has a total % of 560%. (I can forgive them as coding all the tiebreaker scenarios into a season simulator for NFL is a total pain).
Now when the model you create spits out numbers that are wildly different to the betting market, then it’s usually because you’ve done something wrong.
The author mentions some of the problems that will result from using AV:
This will inflate the odds for teams who plan to stick with a struggling rookie through thick and thin, and hurt teams who find a phenom rookie
and
So this model favors ageing teams and may hinder up-and-coming teams
You can read about how AV is calculated here, and it’s related posts including the assumptions used. I’ve read through them and I’ve come to the conclusion that this statistic is not very useful, not in a descriptive sense, and less so in a predictive sense. Check how players’ AV compares year on year:
I compared players’ 2012 AV to their 2013 AV, and also their 2013 AV to their 2014 AV (where possible). As you would think a good player in terms of AV would continue to be good the next year, and vice versa with a bad player. Well the R2 is 0.4231, which means that only 42% of the variation in AV is explained by the AV the previous season. So using AV to predict next year’s AV is not going to be accurate…and so in turn the rest of your model s not going to be accurate…and then you saying that the Jets have a 56% chance to make the playoffs will the Ravens only have a 9% chance (which is only 6% more than the Jags!)
Prediction is hard, it is. I’d recommend reading Nate Silver’s book on it. A very good indicator is the betting market, but an article just quoting implied probabilities isn’t that creative or interesting. But if you create a model and it doesn’t beat the betting market, or at least draw even with it. Then your model is just not good.