As I’ve told some folks, I’ve been setting up a system for NFL data off-and-on over the years. Naturally, this started in Excel, but last year I took the plunge to apply some of the Data Engineering concepts from my time at Databricks and put them into practice.
We’ll get more into the details of all this over time, but for now, I need to get the week one projections out there!
The long-and-short of these projections is that they are the result of an ensemble model that aggregates over 400 data points per NFL game from the past 12 years. This model is then applied to the current slate of games in order to get a likelihood of each team winning the game.
Right now, I’d say the model is most certainly a version 1.0, and I do plan to make some updates to it down the road. But it works, and I figured this season is as good of any to 1) showcase that the Death Star model is fully operational, and 2) evaluate its predictions.
So with that said, here is the slate for week one:
I’ll get into the math a bit more at a later date, but for now, all you need to know is this:
The line: all games are given a line by the bookies in Vegas, which establishes which team is favored to win and by how many points. A negative line reflects a favorite, i.e. the Chiefs are favored to win by 4.5 points over Detroit tonight.
Implied Odds: given the line, it is straightforward to calculate an implied likelihood of the team winning. Broadly, this can be done with historical data and an understanding of how Vegas prices the lines to ensure they make money no matter which team wins. In this case, we see a -3 line (a common line) reflects a ~58% chance that favored team will win.
My Model: as noted, we have an ensemble model that runs to predict the winner based off of ~400 data points per game, and trained on all games from the past 12 years. The model projects a probability that a team will win the game.
Delta: This is the difference between my model and the implied odds. Broadly, a positive delta reflects that my model thinks a team is undervalued and is thus a good bet. Those with a substantial (>5% differential) are tagged here with the green circle.
The Picks
There is a bit more math from there, but the TLDR is that anything with over a 5% delta has a 10% or better estimated ROI. Now, of course, this return is very high volatility, but if this model is any good, then it should more than even out over the course of the season to provide positive returns.
As such, the “picks” made by this column will be those games with a 10%+ ROI, and we’ll see where we end up!
I will note: week one is very tough to predict generally, and especially in a modeled case where often the data being used is calculated off of prior games.
That said, here are the picks! For scorekeeping, we’ll use the money lines at the time of writing and assume a $100 wager on each game. We’ll then keep a running tally over the season and see in the end how much money we lost how rich we would have been if we actually made these bets.
Detroit Lions (+190)
Indianapolis Colts (+184)
Tampa Bay Buccaneers (+220)
Houston Texans (+398)
Arizona Cardinals (+270)
LA Rams (+199)
As mentioned, the model likely won’t be great with the first week, though I somewhat like this approach which could be personified as “there’s not much data, so there’s actually less certainty than the experts who are 82% sure the Ravens will beat the Texans.”
So let’s see what happens!