After completing the DevPost project I decided I would take what I learned and try and do it with the NFL. I go in knowing that I won’t do very well. But, that didn’t stop me from trying to get close.
My first step was to figure out how to collect data. With the college basketball project I just scraped from the NCAA web site. Since my inputs for that project was just based around scoring it was pretty straight forward. With the NFL, I wanted to use more than just the score. I wanted to grab some offense and defensive stats.
To grab the data, I was able to use nflscrapR from Maksim Horowitz. He was able to collect the game stats for each game since 2009 in a JSON file. I was then able to import that into a C# project I had created. A few hundred formatted lines later I had that data.
But, I wanted more. I wanted data from 2000 forward to increase my training data. For these years, I used Pro Football Reference to get it. While the JSON was nice and easy, this was not. I had to download the HTML and then use some string manipulation to get the data.
I now have 4,848 game to use. Breaking that down into 80/20 splits and I have 3,878 games for training and 970 for testing.
For the model, I again used TensorFlow v2 and Keras. Since this is a regression project I am using MSE (mean squared error) as my loss function but using MAE (mean absolute error) for accuracy metric.
I will most likely be tweaking the network layers and nodes for a while to see if I can increase my accuracy. Right now, I am around 8 points away. I would like to see this under a touchdown (7).