Double Deep Q Network

The fourth in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” the OpenAI CartPole environment.

If you missed any of the previous blogs here is the first, second, and third.

Please go to my GitHub repo and get the 06-DDQN Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.

In 2016, Google DeepMind (pdf) found another optimization to their algorithm. They used the idea from the double Q-Learner and added a second neural network. In the double Q-Learner the Q tables were used at random. In this algorithm, they are used as 2 separate entities. You will use the “target” network to predict the next steps during experience replay and then update your “source” network.

Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.

Open in Google Colab06-DDQN.ipynb

MIT Deep Learning Lecture 1

Today I watched the first of 20 lectures on Deep Learning from MIT. The first 2 weeks are 6.S094: Deep Learning for Self-Driving Cars. Week 3 will be 6.S091: Deep Reinforcement Learning. Week 4 will be 6.S093: Human-Centered Artificial Intelligence.

I won’t create one of these for each video but I thought I would create one for the first one just to get started. I will do an overview of each course.

Lecture Slideshttp://bit.ly/deep-learning-basics-slides

YouTube Video: https://www.youtube.com/watch?v=O5xeyoRL95U

Since the slides are available I won’t just recreate them with my notes. Instead, I will just add my thoughts below.

  • Lex Fridman dresses WAY better than I would ever hope to dress.
  • The camera person was having trouble keeping Lex in the little window.
  • Data, data, data. Data is VERY important to anything in the field.
  • “AI began with an ancient wish to forge the gods.” – Pamela McCorduck
  • “Sky is the limit with the tooling”
  • “Why Deep Learning? Scalable Machine Learning”
  • The brain has a lot of neurons. Ha. In my talks, I alway mention how fast you can recognize your mother to show the scale and speed.
  • Activation functions:
    • Sigmoid -> Vanishing gradients. Not zero centered
    • Tanh -> Vanishing gradients.
    • ReLU -> Not zero centered.
  • Loss Functions:
    • Mean Squared Error (Regression): Real number
    • Cross Entrophy Loss (Classification): {0,1}
  • Lex did a pretty nice backpropagation slide
    • Task: Update the weights and biases to decrease the loss function
    • Subtask:
      • Forward pass to compute network output and “error”
      • Backward pass to compute gradients
      • A fraction of the weight’s gradient is subtracted from the weight
  • Semantic Segmentation
    • Encoding -> Decoding
    • I have never seen this before so I will be very interested when we expand on this later.
  • The text to speech slides were interesting but not sure I want to get into that area.
  • AutoML might take my job before I even start it.

For the most part, this was a review from grad school. But, it does make me regret that I didn’t get to take the self driving car class at GT.

TensorFlow 2.0 Testing

Update: 20180228 TensorFlow Probability is now working with the nightly builds.

Today, it was announced that TensorFlow 2.0 preview is available to download and test. I figured I had a few simple projects that use it that I would try and install and see if my results are any different.

Well, I found a bug(ish). I created a ticket and it was quickly solved. It turns out that Windows is dumb and has a limit on how long a file can be. In Win32, this made sense. Today, not so much. After a registry change and a reboot I was up and running. Great work from the TensorFlow team and bad work on the Windows developers. Ha.

Here is what I did:
Created a new environment called ‘tf_daily’ to handle all test: conda create -n tf_daily python=3.6
Activate the environment: activate tf_daily
Install the daily build: pip install tf-nightly-2.0-preview
Install the daily probability library: pip install tfp-nightly
Install pandas: pip install pandas
Install Seaborn: pip install seaborn
Install the plugin to allow me to use this environment: conda install nb_conda

We will see how all of my testing goes. My guess is that I will have some classes get moved around on me.

Deep Q Network

The third in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” the OpenAI CartPole environment.

If you missed any of the previous blogs here is the first and second.

Please go to my GitHub repo and get the 05-DQN Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.

I skipped over my neural network notebook as it is basically some background knowledge and not much code. If you are going through the series do go back and look through it.

In 2015, Google DeepMind (link) published a paper in Nature magazine that combined a neural network with RL for the first time. They understood that using function approximation from neural networks would open up this algorithm to a much larger environment. They used ONLY the raw pixels and the score for the inputs and were able to master quite a few Atari games.

Google DeepMind used a convolution layer to transcribe the pixels to input which I don’t do here. At some point, I might try and recreate some of their results.

There are a few key differences from Q-Learner and DQN. The first is that a Q-Learner would process at each step that current set of observable. DQN uses what they call experience replay. The algorithm stores up all of the observable and then at set times they would grab a batch of them and process on them. They would the fit that batch on the NN and use the built in back propagation to train the network.

Take a look at the notebook and I go through the algorithm against the same CartPole environment.

Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.

Open in Google Colab05-DQN.ipynb