By running a simulation of the rest of the NHL season 100,000 times we can create precise probabilities of the outcome of the season
for each team. Each game is simulated using the probabilities from the pre-game prediction model discussed below. For games further into the future,
the model scores are regressed to the mean to account for uncertainty.
Pre-Game Prediction Model
The win prediction model was built on regular and post-season NHL games from the 2007-2008 season to the 2014-2015 season. The model
predicts the home's team chance of winning the game. Games that went to overtime were not used to build the model. Also, each team's
first 20 games of each season are not used. Both logistic regression and gradient boosting were used to build components of the model. The previous version of the model used only logistic regression. The Power Rankings on MoneyPuck.com are solely driven off of the win prediction model.
The data used to predict each game includes each team's performance of a range of statistics in the season up until the date of the given game. Older games are given
less weighting for the statistic. The weighting each game is given is linear. For example, when predicting the result of a team's 41st game of the season, the team's 40th game is given twice the
weight as a team's 20th game. Several game weighting techniques were tested as part of building the model, from weighting each game equally
to weighting recent games exponentially more. Also, using just the last 20 or 30 games data was evaluated in combination with each one of these methods. Ultimately,
using full season to date data with a linear decay of game importance showed to have the most predictive power.
There are three main components of the win predictions model: the Home submodel, the Away submodel, and the Meta model:
The Home submodel uses only statistics describing the home team and predicts the likelihood the home team will win the game.
The Away submodel similarly uses only statistics describing the away team and predicts the likelihood the away team will win the game.
The Meta Model combines the Home submodel, Away model, home ice advantage, and each team's days of rest into one overall score.
Also, we use a simple 'Tie Game' model, which predicts the probability the game will go to overtime. This model is simply a function
of the meta model score. The default Tie Game model score is 25%. For every 1% probability the Meta Model is away from a 50/50 odds game, the chance of the game going to OT goes down
by 0.2%. For example, a game with 55/45 odds is given a 24% chance of going to OT. If a game goes to OT, we give each team equal odds
of winning the game, whether in regular OT or a shootout.
The home team's overall odds of winning the game are then calculated as follows:
Home Team chances of winning In regulation = (1 - [Tie Game Model Score]) * [Meta Model Score]
Home Team chances of winning in OT = [Tie Game Model Score] * 50%
Home Team chances of winning = Chance of winning in regulation + chance of winning in OT
Variables In Both the Home and Away Team Models:
Expected goals from non-rebound shots (Even Strength Adjusted) - Each shot the team has take at even strength is given a probability that it will be a goal. This variable is the sum of all the team's shot probabilities divided by the sum of all shot probabilities against them. Unlike corsi or fenwick, this variable factors in shot quality. This variable is the most influential variable in the model. Rebound shots were excluded since they were shown to be less predictive of future wins than just counting non-rebound shots. This is likely due to a higher degree of luck in getting rebounds than other shots, and thus less repeatability.
Expected goals in power play and penalty kill situations - Similar to the expected goal variable above, but only looking at man-advantage situations. The team's expected goals on 5 on 4 power plays is divided by the total amount of power play time they've had. The same is calculated for the total expected goals against them on penalty kills divided by the amount of time the team has spent killing penalties. The final variable is the power play expected goals metric divided by the penalty kill expected goals metric.
Unblocked Shot Attempts For % (Even Strength Adjusted) -This variable, also known as score adjusted fenwick, describes the team's share of unblocked shot attempts at even strength in their previous games. The count
of unblocked shot attempts is adjusted for the score in game at time that they occurred. For example, teams generally take more unblocked shots attempts when trailing
by a goal, so shot attempts in that situation are discounted by about 9%. Unblocked shot attempts include shots on goals and shots that miss the net. In the older version of the model that did not included expected goals, score adjusted fenwick was the most important variable.
Like all the other variables
below, this variable is weighted to give more weight to more recent games in the season.
Save Percentage -Simply the team's save percentage for all goaltenders in all situations.
Shooting Percentage (Even Strength) -The team's shooting percentage in even strength situations.
Share of Power Play Time -The team's % share of power play time. (Compared to the amount of time they spend on the penalty kill)
Variables In Meta Model:
Rest Category -Set to 1 if the home team played yesterday but the away team didn't. Set to -1 if the away team played yesterday and the home team
didn't. Otherwise set to 0. Surprisingly, the lack of rest days impacted the home team as much as the away team when creating versions of this variable.
This variable can be the difference between the home team having a 51% chance of winning and a 58% chance of winning against a team they are the same quality as in the Power Rankings.
Home Team Model -as described above
Away Team Model -as described above
Meta Model Coefficients: Home Team Model / Away Team Model * 3.7781 + Rest Category * 0.1406 - 4.3546
Shot Prediction Expected Goals Model
This model predicts the probability of each shot being a goal. Factors such as the distance from the net, angle of the shot, type of shot, and what happened before the shot are key factors in the model. This model was built on over 50,000 goals and 800,000 shots in NHL regular season and playoff games from the 2007-2008 to 2014-2015 season with location data. By adding up all the probabilities of a team's shots during a game, we can calculate the team's expected goals in that game. The model was built using gradient boosting. MoneyPuck's expected goals model uses a different variable strategy than other expected goals like from Corsica Hockey or HockeyGraphs.com. The MoneyPuck expected goals model does not explicitly use variables for rebounds or rush shots. Rather, it looks at the 'speed' between events: The distance on the ice between the shot and the event before it divided by the amount of time that's elapsed. Also, for rebound shots the model looks at the change in angle between the shots divided by the amount of time between the two shots. The illustrations below describe how the speed variables are created:
By using the 2015-2016 season as a test to see if the model works, the 15% of shots the model rated the highest contributed to over 50% of the goals that season:
In general, the shots with the highest goal probability are quick rebounds shots close to the net where there has been a large change in shot angle from the original shot:
Variables In Shot Prediction Model:
1.) Shot Distance From Net
2.) Time Since Last Game Event
3.) Shot Type (Slap, Wrist, Backhand, etc)
4.) Speed From Previous Event
5.) Shot Angle
6.) East-West Location on Ice of Last Event Before the Shot
7.) If Rebound, difference in shot angle divided by time since last shot
8.) Last Event That Happened Before the Shot (Faceoff, Hit, etc)
9.) Other team's # of skaters on ice
10.) East-West Location on Ice of Shot
11.) Man Advantage Situation
12.) Time since current Powerplay started
13.) Distance From Previous Event
14.) North-South Location on Ice of Shot
15.) Shooting on Empty Net
Live In-Game Model
The in-game model is driven primarily off of the current score and time left. By looking at outcomes of historical games based off of score and time left, we can create probabilities for tied, one goal, two goal, three goal, and 4+ goal difference situations for any amount of time left in games. When a penalty happens the model calculates the win probability of each team if goal(s) are scored on the power play or not and then weights those probabilities by the chance of them happening. The pregame model is also used to calculate the win probability. At the start of the game, the pregame model is influential but gradually is weighted less as the game goes on. In the previous version of the model, the teams' performance during the game (like scoring chances that didn't result in a goal) were factored into the in-game prediction. However, this data did not add significant additional value over the new pre-game model and was removed.
By leveraging the season simulations in the event of each of a regulation win, regulation loss, OT loss or OT win, we can see the impact of playoff odds in real time as the odds of different outcomes
of the game change.
So does the model actually work?
By running the model on an out of time validation sample (the 2015-2016 season), we can see how well the model performs.
Below is a graph of the % of time the team that were considered the 'favorites' ended up winning the game. Before games start we can predict ~57% correctly. As games continue our confidence gradually goes up until ~89% at the end of regulation. Of the games that go to OT, the outcome is basically a coinflip.
MoneyPuck was created by Peter Tanner using mostly Python and is run on AWS. He can be contacted at email@example.com, @pr_tanner, or at peter-tanner.com which has more research and analytics. The MoneyPuck.com twitter bot, which tweets probability changes after key goals, can be followed at @MoneyPuckdotcom