For data collection I used a C# application that would download the NCAA results for the last 4 seasons. It was pretty straight forward string manipulation. It wasn’t until later that I discovered that some teams don’t have their names kept the same. State to St. for example. At first, I went back and cleaned up the CSV but decided that it would be smarter to just handle it in code. This would allow me to not worry about changes going forward.
I fought with the structure (layers, nodes, optimizer, activations) for a while. I knew that I didn’t want more than a few layers and with 20k games I didn’t want a lot of nodes. I settled into a sweet spot with 32/32. I had dropouts but decided that I wanted to remove them.
My network is set up for Home/Away structure and a tournament games doesn’t have a home team. At first I assumed that it would matter as long as I used the teams “away” stats. This turned out to be not true. To work around this issue I ran the prediction twice and average them out.
Overall, I liked the experience and will probably try and do an NFL version. As long as you keep in mind that MILLIONS of dollars are spent each week in Vegas you will never beat their lines. But, you can easily find some weaknesses and beat them in a few games.