Recently, Google DeepMind released a paper on their most recent RL agent called Agent57. It was named after the 57 Atari 2600 games released called the Arcade Learning environment. From this the Atari57 benchmark was created to summarize the ability of the agent. The idea was that these games were different enough that you would have to have an intelligent agent. It was a step closer to “General Intelligence” if someone was able to create an agent that would outpace the average human.
How we got here
In 1989, Chris Watkins developed the Q-Learning algorithm (my Jupyter Notebook on it!). The next enhancement was the Double Q Learner (another notebook) in 2010 by Hado Van Hasselt that added a second Q table. DeepMind then added a neural network in 2015 and created a Deep Q-network (my notebook!). Much like before, a second network was added in 2016 the Double Q-network (I have a whole set of notebooks!!).
At this point, I stopped researching these and have been out of the loop. Once I figure out unlimited time I will have to get back into it. Luckily, the researchers did not stop. They added Prioritised Replay, Dueling Heads, and some distributions. Then they created R2D2 with better short term memory. 2019 came and Never Give Up was created with better exploration and memory. A major leap forward during this time was the use of distributed agents. This allows the agents to be scaled up and speed up learning.
What Does this mean
To me, this is awesome that we were able to get an agent that could handle games with immediate rewards plus long term rewards. We were able to handle games with a large array of tasks as well as a smaller array of tasks. And, we were able to scale.
Does this mean we have found General Intelligence? No, absolutely not. It does, however, open up a wide variety of possibilities. It makes the unknown task possible to solve. We can deploy this agent into an environment and have a pretty solid expectation that it will be able to handle the task.
Anyway, this is awesome and I finally found some time to look into it. Maybe, I will even be able to break it out and play some games with it.
DeepMind Blog Post: Link
Last week I gave my RL talk to the attendees of Prairie.Code. It did not go well.
The talk should really be 90 minutes and the reviews showed that people agreed. There was too much dense data that I needed to slow down and really cover in more detail.
I don’t think I will continue to do the talk as I have hit most local talks but if I do decide to give it again I need to expand on the base Q Learner and make sure that is understood.
My plan now is to create a talk about deploying models to the cloud. I want to get into the AI Engine and Tensorflow.js and how they work in the ecosystem.
I would also like to dig into MineRL and maybe use that as the engine to speak more on RL.
Anyway, I went in feeling rushed and I should have followed my gut and fixed it. Hopefully, it doesn’t keep me from speaking again.
I am giving a Reinforcement Learning at the GDG Denver group. I decided to upgrade my RL notebooks to TF2 and then add some of the TF agents stuff that was announced at Google I/O. As always, this is hosted on my GitHub page https://github.com/ehennis/ReinforcementLearning.
Here is a quick rundown of how I set up the environment to convert my Jupyter Notebooks to TF v2.
Using my old post, to create an environment with a few changes since TF v2 is in beta now.
Commands to setup the environment:
conda create -n gdg_denver python=3.6
pip install tensorflow==2.0.0-beta1
pip install pandas
pip install seaborn
pip install gym
conda install nb_conda
Commands to launch the notebooks:
Since I am pretty straight forward in my usage of TF and Keras there wasn’t much to change. Nothing changes as far as ‘
import tensorflow as tf‘ goes but we do have to change where we get Keras. That is now ‘
from tensorflow import keras‘
Overall, it was a great experience. TWCC has done a great job in their 20+ years and this year was no different. I wasn’t able to stay the entire time but from everything I saw it was great. The facilities were perfect for an event this sized and it appeared that everyone was getting a long and there were multiple groups of people have conversations about the current topics that were presented.
The only downside was that I had to leave home at 5am to get to the start and hit some ice on the way up. Can’t fight mother nature!
GitHub Repo: https://github.com/ehennis/ReinforcementLearning