DDQN TensorFlow v2 Upgrade

In my previous blog post I showed how I upgraded my Black Scholes/Monte Carlo notebook to use TensorFlow v2. Today, I am going to show how I was EXTREMELY easily able to convert DDQN to the pre release of TensorFlow v2.

The notebook is located here: DDQN-TFv2.ipynb

Since I was mostly using Keras there were a few library changes but the code ran pretty much as is.

Double Deep Q Network

The fourth in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” the OpenAI CartPole environment.

If you missed any of the previous blogs here is the first, second, and third.

Please go to my GitHub repo and get the 06-DDQN Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.

In 2016, Google DeepMind (pdf) found another optimization to their algorithm. They used the idea from the double Q-Learner and added a second neural network. In the double Q-Learner the Q tables were used at random. In this algorithm, they are used as 2 separate entities. You will use the “target” network to predict the next steps during experience replay and then update your “source” network.

Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.

Open in Google Colab06-DDQN.ipynb

Deep Q Network

The third in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” the OpenAI CartPole environment.

If you missed any of the previous blogs here is the first and second.

Please go to my GitHub repo and get the 05-DQN Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.

I skipped over my neural network notebook as it is basically some background knowledge and not much code. If you are going through the series do go back and look through it.

In 2015, Google DeepMind (link) published a paper in Nature magazine that combined a neural network with RL for the first time. They understood that using function approximation from neural networks would open up this algorithm to a much larger environment. They used ONLY the raw pixels and the score for the inputs and were able to master quite a few Atari games.

Google DeepMind used a convolution layer to transcribe the pixels to input which I don’t do here. At some point, I might try and recreate some of their results.

There are a few key differences from Q-Learner and DQN. The first is that a Q-Learner would process at each step that current set of observable. DQN uses what they call experience replay. The algorithm stores up all of the observable and then at set times they would grab a batch of them and process on them. They would the fit that batch on the NN and use the built in back propagation to train the network.

Take a look at the notebook and I go through the algorithm against the same CartPole environment.

Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.

Open in Google Colab05-DQN.ipynb

Double Q-Learning

The second in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” the OpenAI CartPole environment.

If you missed any of the previous blogs here is the first.

Please go to my GitHub repo and get the 03-DoubleQLearning Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.

Double Q-Learning was created by Hado van Hasselt (who actually replied to my email when I was creating this project) in 2010. He noticed that using ‘max’ overestimated the action values and you could get into a situation where your results would vary widely. His solution was to use a second Q table and randomly swap between the tables. He would use the “other” table to grab the value for the update equation.

In the notebook, you can see the updated equations as well as trying your best to code them up. After getting the correct solution you can continue on to the fully coded algorithm and see if you can beat my best solution.

One thing to point out, you can see that the yellow line in this notebook is a lot more smooth than in the original Q-learner. That is what was fixed by the second Q table.

Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.

Open in Google Colab03-DoubleQLearning.ipynb

References
Hasselt, H. V. (2010). Double Q-learning. Advances in Neural Information Processing Systems 23,2613-2621. Retrieved from http://papers.nips.cc/paper/3964-double-q-learning.pdf