Prairie.Code Talk

Last week I gave my RL talk to the attendees of Prairie.Code. It did not go well.

The talk should really be 90 minutes and the reviews showed that people agreed. There was too much dense data that I needed to slow down and really cover in more detail.

I don’t think I will continue to do the talk as I have hit most local talks but if I do decide to give it again I need to expand on the base Q Learner and make sure that is understood.

My plan now is to create a talk about deploying models to the cloud. I want to get into the AI Engine and Tensorflow.js and how they work in the ecosystem.

I would also like to dig into MineRL and maybe use that as the engine to speak more on RL.

Anyway, I went in feeling rushed and I should have followed my gut and fixed it. Hopefully, it doesn’t keep me from speaking again.

Advertisements

GDG Denver: RL Talk

I am giving a Reinforcement Learning at the GDG Denver group. I decided to upgrade my RL notebooks to TF2 and then add some of the TF agents stuff that was announced at Google I/O. As always, this is hosted on my GitHub page https://github.com/ehennis/ReinforcementLearning.

Here is a quick rundown of how I set up the environment to convert my Jupyter Notebooks to TF v2.

Using my old post, to create an environment with a few changes since TF v2 is in beta now.

Commands to setup the environment:

conda create -n gdg_denver python=3.6
activate gdg_denver
pip install tensorflow==2.0.0-beta1
pip install pandas
pip install seaborn
pip install gym
conda install nb_conda

Commands to launch the notebooks:

jupyter notebook

Since I am pretty straight forward in my usage of TF and Keras there wasn’t much to change. Nothing changes as far as ‘import tensorflow as tf‘ goes but we do have to change where we get Keras. That is now ‘from tensorflow import keras

Introducing FishButler!

At the start of the summer I wanted to do something that would keep my nephew (13) and daughter (11) productive and off of YouTube or Fortnite. My plan was to create a fairly simple Android application that they could help me with and get published before they went back to school. FishButler is that application.

I threw them into the deep end with full test coverage (unit and user) and source control. For testing, I used the built in testing frameworks in Android Studio. I had them create branches and tags and pull requests in GitHub to make sure they got to experience source control. I wanted them to get the full development life-cycle.

To get started, I took a day off of work and had them sit around a table with their computers and we had a mini hackathon. After that they went on their ways and handled the GitHub tasks I assigned them.

My nephew had done some intro programming but nothing at this level. My daughter had done nothing more than watching me. To help, I created 10 documents that covered coding standards as well as Git commands. For their tasks I did a fairly detailed write-up that they could see a high level overview of the step and then the commands to do them.

In the future, I would like to add some image recognition software and add in some other external items like Maps and reports. I will also have them do some more open ended research tasks to get a better feel of Android as a whole.

NLP: Natural Language Processing

In my quest to get a full time spot practicing what I have learned in ML has brought me to a job posting that is going to process complaints and allow the company to analyze them to determine next steps in improving customer response.

I haven’t dug much into text analysis outside of a semester long project where we used movie summaries to determine the viability of a movie. We used the top 100 keywords and then trained a neural network on them.

Wikipedia describes NLP as a way to program computers to process and analyze large amounts of natural language data.

Over the next few weeks I am going to throw myself into the NLP world and see how much I can learn.

QwikLabs: Intro to ML: Image Processing

As stated in my previous post, I was given 1000 credits (~$1000) for QwikLabs. Today, I finished my first “quest”. It was titled Intro to ML: Image Processing.

I will restate this but I LOVE how QwikLabs are set up. They give you an entire Google Cloud account so you don’t have to mess with your account and get unwanted billing and other changes. Once the lab is done, the account gets deleted and you go on your way.

This lab covered a few different aspects of Google Cloud. First, the console. If you are familiar with working in Linux this is a simple transition. Second, we work with the storage system called “Buckets”. We mess around with some pretty simple permissions as well as uploading files for processing.

The part that I liked was using the AI-Engine to host a trained model. It was a super simple model but since I failed last time it was cool to see it work as expected. Plus, they showed how to host and access externally. This will definitely be something I do once they start supporting TFv2.

The last few section was using the API to process images. The first was a simple recognition. This stood out because you could change the calling JSON and have it return internet articles that contained the same image. Second, we processed an image to determine people’s faces and possible emotions as well as landmarks. Finally, we processed a sign with some French text on it. We were able to translate it to english as well as add some more processing that would give us information (links, etc.) about what was printed.

Overall, VERY COOL first lab. I will get started on my next round of cloud training soon.

Pluribus: Facebook and Carnegie Mellon’s Poker AI

Article: https://science.sciencemag.org/content/early/2019/07/10/science.aay2400

There are few things for frustrating to me in the machine learning/AI world than seeing Buzzfeed type companies write about ML/AI. Sites like that burn down the actual facts into clickbait titles. It seems any “learning” that occurs is a step away from Skynet. I honestly don’t have a single site that I trust that I can go to and see the facts displayed. I always have to fall back on the technical paper that was hopefully written or something from the actual authors.

I come from an ML background with very little AI experience but most of the latest advancements are similar to Reinforcement Learning that I can piece it together. So, this is my attempt to do just that.

Poker and Machines

I have long wanted to be part of something that would be able to “solve” poker. The entire state and action space fascinate me. I was always of the opinion that with infinite knowledge you could beat emotional players. It always seemed the best and most consistent players were the mellow mathematicians that approached the game like a math problem. At each step, there are known percentages on different plays. A computer with a much larger memory than we could ever hope could have all of this available.

With Google doing so well with Chess and Go I figured it was fairly close to getting poker. Obviously, poker doesn’t have “perfect” information since you don’t know what the opponent has in their hand.

Learning From Machines

My biggest excitement going forward is how humans can learn from machines. It has been stated that the top chess players have learned a few new opening strategies after playing AlphaZero.

EXCLUDING THE ETHICAL ASPECT, I am curious to see what we could learn on the battlefield from an advanced war simulation. If there is something out there that could save lives during our ongoing battles that humans have never even thought about.

Evan’s Summary

Discretization

From what I can tell, the researchers at CMU had the same problems that early Q-learning did where the action space and state space were too large to handle in a traditional array. Q-learning went towards neural networks eventually but early on they did discretize the input. That is what Pluribus is doing with the actions and information.

For the action state they group the bet sizes (think $105 is the same as $100 or $110) and then during play they use a search algorithm to narrow down the decision.

For the information collected they group similar hands together IN FUTURE ROUNDS since they are played the same. But, in the current round they use the exact hand.

Offline Training (Blueprint)

To build the offline model they used what is common called counterfactual regret (CFR) minimization. Basically, after the hand is over they go back through and decide what “should” have been done and how it affected the outcome. What is new (to me at least) is they used a Monte Carlo simulation to sample actions versus traversing through the entire game. Because they have all the players in the offline game the AI knows all of this information.

Counterfactual Regret

CFR guarantees the overall interactions converge to a nash equilibrium in a zero-sum game.

CFR guarantees in all finite games that all counterfactual regrets grow sublinearly in the number of iterations. This, in turn, guarantees in the limit that the average performance of CFR on each iteration that was played matches the average performance of the best single fixed strategy in hindsight. CFR is also proven to eliminate iteratively strictly dominated actions in all finite games

Superhuman AI for multiplayer poker

Training Time

According to the paper the training was done of 8 day on a 64-core server with 12,4000 CPU core hours. They used less than 512 GB of memory. The assume current cloud rates and said it would cost them about $144 to produce.

Playing Strategy

Because the offline playing is “course” because of the complexity of poker they only use it for the first round where they considered it safe to use. After that, or if the player confuses it by betting oddly, they use real time search.

While real time search has been successful in perfect information games there is a problem in imperfect information games. The paper says it is “fundamentally broken”. There is example is Rock/Paper/Scissors. In my previous blog post I showed that the expected value of each move is 1/3. Using the search algorithm you would think each move is the same and will just pick scissors each time. This would cause an issue as the other player would know this and win every time with rock.

There were 2 alternatives that were used in the past. AI DeepStack determined the leaf nodes value based on the strategy used to get to the leaf. This doesn’t scale with a large tree. The other alternative was used in AI Libratus who would only use the search when they could extend to the end of the game. This wouldn’t work with the addition of extra poker players.

Pluribus instead uses a modified form of an approach that we recently designed—previously only for two-player zero-sum games (41)— in which the searcher explicitly considers that any or all players may shift to different strategies beyond the leaf nodes of a subgame. Specifically, rather than assuming all players play according to a single fixed strategy beyond the leaf nodes (which results in the leaf nodes having a single fixed value) we instead assume that each player may choose between k different strategies, specialized to each player, to play for the remainder of the game when a leaf node is reached.

Superhuman AI for multiplayer poker

They used 4 strategies that were all based on the baseline blueprint strategy. The first was the actual blueprint, the second was the blueprint with a bias towards folding, the third is a bias toward raising, and the fourth is a bias towards calling.

Bluffing

Another issue of the imperfect game is around bluffing. The optimal strategy with the best hand it to play and the worst hand is to fold. If you do this all the time the other players will know what you have. To help solve this, Pluribus determines the probability of reaching the current point with each possible hand. It then creates a balanced strategy and does that action.

Testing

Pluribus was tested against elite human players (won at least $1 million playing poker) in 2 formats. The first was 5 v 1 with 5 players and the second was with 5 Pluribus instances.

The measurement was the number of milli big blinds per game. This is the measurement of the number of big blinds won per 1k hands of poker. Pluribus ended up winning 48mbb/game which is considered very high and shows it is “better” than the players it was against.

Conclusion

I hope this helps clear the hype around most articles written about the subject and maybe a little easier to read than the Science article linked above.

I am excited to see if this will lead any poker players to change their game or if this will kill the online poker world.