Predict the Best NHL Fantasy Lineup

jon macmillan
senior data analyst

Winter is coming. Well, for some of us, winter is already here. What does that mean? It is time to start paying attention to the NHL, or more importantly, the NHL fantasy league. Surprisingly, despite growing up in northern New Hampshire I never played hockey, at least organized hockey. We played pond hockey from time to time, but I never played official ice hockey until a couple of years ago when I joined a league. After playing the game, you realize just how impressive the NHL really is.

And that is why, when I came across an NHL dataset on Kaggle, I decided to put together some data-driven fantasy lineups. Why did I decide to play NHL fantasy? First and foremost, this is the most complete sports data that I have seen that is free and easily accessible. Second, there is far less competition and as the dataset contributor mentions, advanced data-driven fantasy statistics in hockey are still in their infancy.

So with that, I turned to the data. Using the data from Kaggle, I was able to create a dataset using Veera Construct to help predict the fantasy points for every active player in the NHL. This entailed going back through the past few years of stats and identifying key indicators of a player’s success, which turned out to be different for each position.

Data Cleanup

To prepare this data I was able to connect to all of the extracts from Kaggle and incorporate that into a visual data preparation process in Veera Construct. NHL player data is incredibly detailed, but I wanted to start out fairly basic. In this case, I set up my dataset to help me predict total fantasy points for each player for a given year. Ideally, I would like to eventually create models for specific game predictions, which would require some more intensive data preparation.

The first thing I looked at was the actual scoring system. All fantasy sports score certain outcomes differently. In this case I was going to be competing on FanDuel and the competition I decided to enter scored as follows:

With this in mind, I set out to prepare the data the best I could to help illustrate and identify not only how these characteristics impact projected overall fantasy points, but what other elements might influence them. If a player scored a large number of goals in the previous year, we could expect a similar outcome for the following year. That wouldn’t be all that hard to determine on my own, but I wanted to find other influencers that may not be as obvious at first glance. For instance, we can assume that players who have more time on ice will more than likely have more goals and more assists, which are by far the biggest contributors to fantasy points.

To look at all these factors I took a look at what was available in the data and then how I could massage it to get everything in a single modeling file that would be most useful when predicting fantasy points. I created a process in Veera Construct that merged all of these tables together and then looked at the previous year of data for both total outcomes and game by game averages, as well as looking at the previous two years before that to see the trend over time.

For instance, Tyler Seguin is currently one of the premier centers in the league, ranked eighth overall on nhl.com. In the 2017 season, he played a total of 82 games and averaged about 20.9 minutes of ice time, which was the seventh most among centers. Additionally, if we look back at the 2016 and 2015 season, he averaged 18.9 minutes. A two-minute increase over the past two years is huge and may mean this is a player on an upward trend.

In comparison, John Tavares of the Toronto Maple Leafs is ranked sixth overall in that same list. However, Tavares averaged 19.9 minutes of ice time in 2017, which was down from 20.4 minutes in the 2016 and 2015 season. This means, while Tavares is ranked higher and is a more expensive player on FanDuel ($8,500 compared to $7,900 for Seguin), it seems like Seguin is the better bet here, simply by looking at time on ice.

In the end, I had about 120 different measures that I outputted to a modeling file that I can then use in Veera Predict to analyze these in comparison to our outcome, total fantasy points. This is what the process looks like in Veera Construct:

To see how my analysis turned out and whether or not I am still working at Rapid Insight or am currently living on my own island that I bought with my NHL fantasy winnings, stay tuned for the next blog post.