The first in my series on RL that I created in graduate school at Georgia Tech will be on the Q-Learning algorithm. I will use the algorithm to “solve” 2 different gyms from OpenAI. The first is an altered FrozenLake and the second is the CartPole environment.
Just note that I am skipping over the first notebook as that is just introduction with MDPs and PI/VI.
Please go to my GitHub repo and get the 02-QLearning Juypter Notebook and follow along. It will make this a lot easier and will fill you in on any of the missing pieces that I leave out in this write up. Also, I can’t put code into these posts without some plugins that are not allowed on my current tier.
Quick Introduction: Q-learning is an RL technique. It was “discovered” in 1989 by Chris Watkins [web page] after working with Sutton and Barto’s book on reinforcement learning. During some research he came up with a new algorithm that didn’t need to model the environment like MDPs.
The next few segments of the notebook explain some hyper parameters, some methodologies, and finally show some pen and paper examples.
Next, I cover solving the FrozenLake example by creating a custom environment that removes the slippage. I do this to ensure that it is easy on the users to see the optimal solution without having to run many more iterations when their expected commands don’t do what they want.
This is a pretty straight forward example to get a grasp on the update rule as well as how the gym environments work.
Continuous Environments: This section is where I introduce an environment that can’t be held in memory. This requires us to “discretize” the environment. I go through some steps that will show the user the range of values for each of the observable variables. I then chunk those together to trim down the possible environment size into something manageable.
Finally, I put everything together and code up the algorithm with fairly good results.
Please, download the notebook and give it a try. I even challenge you at the end to beat my solution in fewer iterations.
Open in Google Colab: 02-QLearning.ipynb
My final course (I took 2 the last semester but the other class wasn’t as cool even though it had the same instructor) in the GT OMS CS program was 6460 Education Technology. The purpose of the course was to contribute to education technology. You could do research, create content, or write code with the hopes that you continue after the semester is over.
Going into the course I know I wanted to create something that would teach Reinforcement Learning. I got the idea from the head TA in the RL course I took in the summer of 2017. He created a GitHub that would compliment the course’s book. He has since gone on to write a book based on his work.
Since his work was so good I wanted to go a slightly different direction so I decided to just focus on Q-Learning and its various iterations.
Here is the result: https://github.com/ehennis/ReinforcementLearning
I created 6 Jupyter Notebooks that will take the user through each algorithm and hopefully introduce them to RL and get them hooked like I was when I first saw it.
Feel free to fork/copy/etc. the repo and see how it goes. If I get motivated enough I might try and turn this into a video series with someone.
Other notes about the course: The first section of the course was doing some research on general education technologies as well as different types of learning types. I fought this at first but after the first few weeks I just accepted that I needed to do the leg work and found out some pretty cool stuff.
After 3 years and 10 classes (plus 1 I dropped when the baby was born) I finally graduated. I now have a master’s degree in computer science with a specialization in machine learning.
Where to now? I am not sure. I would really like to get working with Amazon DeepRacer. I also want to get involved with Google Development Experts program with TensorFlow. So, my learning is just beginning.
In the near future I will post a few things from school that I have completed. I just created a project that does an introduction to reinforcement learning that I used for my Iowa Code Camp talk.
Ideally, after spending some time with my family I would like to keep this blog up to date with what I am working with.
Until next time!