Recently, Google DeepMind released a paper on their most recent RL agent called Agent57. It was named after the 57 Atari 2600 games released called the Arcade Learning environment. From this the Atari57 benchmark was created to summarize the ability of the agent. The idea was that these games were different enough that you would have to have an intelligent agent. It was a step closer to “General Intelligence” if someone was able to create an agent that would outpace the average human.
How we got here
In 1989, Chris Watkins developed the Q-Learning algorithm (my Jupyter Notebook on it!). The next enhancement was the Double Q Learner (another notebook) in 2010 by Hado Van Hasselt that added a second Q table. DeepMind then added a neural network in 2015 and created a Deep Q-network (my notebook!). Much like before, a second network was added in 2016 the Double Q-network (I have a whole set of notebooks!!).
At this point, I stopped researching these and have been out of the loop. Once I figure out unlimited time I will have to get back into it. Luckily, the researchers did not stop. They added Prioritised Replay, Dueling Heads, and some distributions. Then they created R2D2 with better short term memory. 2019 came and Never Give Up was created with better exploration and memory. A major leap forward during this time was the use of distributed agents. This allows the agents to be scaled up and speed up learning.
What Does this mean
To me, this is awesome that we were able to get an agent that could handle games with immediate rewards plus long term rewards. We were able to handle games with a large array of tasks as well as a smaller array of tasks. And, we were able to scale.
Does this mean we have found General Intelligence? No, absolutely not. It does, however, open up a wide variety of possibilities. It makes the unknown task possible to solve. We can deploy this agent into an environment and have a pretty solid expectation that it will be able to handle the task.
Anyway, this is awesome and I finally found some time to look into it. Maybe, I will even be able to break it out and play some games with it.
DeepMind Blog Post: Link