Please go to my GitHub repo and get the 06-DDQN Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.
In 2016, Google DeepMind (pdf) found another optimization to their algorithm. They used the idea from the double Q-Learner and added a second neural network. In the double Q-Learner the Q tables were used at random. In this algorithm, they are used as 2 separate entities. You will use the “target” network to predict the next steps during experience replay and then update your “source” network.
Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.
Open in Google Colab: 06-DDQN.ipynb