NLP: Natural Language Processing

In my quest to get a full time spot practicing what I have learned in ML has brought me to a job posting that is going to process complaints and allow the company to analyze them to determine next steps in improving customer response.

I haven’t dug much into text analysis outside of a semester long project where we used movie summaries to determine the viability of a movie. We used the top 100 keywords and then trained a neural network on them.

Wikipedia describes NLP as a way to program computers to process and analyze large amounts of natural language data.

Over the next few weeks I am going to throw myself into the NLP world and see how much I can learn.

QwikLabs: Intro to ML: Image Processing

As stated in my previous post, I was given 1000 credits (~$1000) for QwikLabs. Today, I finished my first “quest”. It was titled Intro to ML: Image Processing.

I will restate this but I LOVE how QwikLabs are set up. They give you an entire Google Cloud account so you don’t have to mess with your account and get unwanted billing and other changes. Once the lab is done, the account gets deleted and you go on your way.

This lab covered a few different aspects of Google Cloud. First, the console. If you are familiar with working in Linux this is a simple transition. Second, we work with the storage system called “Buckets”. We mess around with some pretty simple permissions as well as uploading files for processing.

The part that I liked was using the AI-Engine to host a trained model. It was a super simple model but since I failed last time it was cool to see it work as expected. Plus, they showed how to host and access externally. This will definitely be something I do once they start supporting TFv2.

The last few section was using the API to process images. The first was a simple recognition. This stood out because you could change the calling JSON and have it return internet articles that contained the same image. Second, we processed an image to determine people’s faces and possible emotions as well as landmarks. Finally, we processed a sign with some French text on it. We were able to translate it to english as well as add some more processing that would give us information (links, etc.) about what was printed.

Overall, VERY COOL first lab. I will get started on my next round of cloud training soon.

Pluribus: Facebook and Carnegie Mellon’s Poker AI


There are few things for frustrating to me in the machine learning/AI world than seeing Buzzfeed type companies write about ML/AI. Sites like that burn down the actual facts into clickbait titles. It seems any “learning” that occurs is a step away from Skynet. I honestly don’t have a single site that I trust that I can go to and see the facts displayed. I always have to fall back on the technical paper that was hopefully written or something from the actual authors.

I come from an ML background with very little AI experience but most of the latest advancements are similar to Reinforcement Learning that I can piece it together. So, this is my attempt to do just that.

Poker and Machines

I have long wanted to be part of something that would be able to “solve” poker. The entire state and action space fascinate me. I was always of the opinion that with infinite knowledge you could beat emotional players. It always seemed the best and most consistent players were the mellow mathematicians that approached the game like a math problem. At each step, there are known percentages on different plays. A computer with a much larger memory than we could ever hope could have all of this available.

With Google doing so well with Chess and Go I figured it was fairly close to getting poker. Obviously, poker doesn’t have “perfect” information since you don’t know what the opponent has in their hand.

Learning From Machines

My biggest excitement going forward is how humans can learn from machines. It has been stated that the top chess players have learned a few new opening strategies after playing AlphaZero.

EXCLUDING THE ETHICAL ASPECT, I am curious to see what we could learn on the battlefield from an advanced war simulation. If there is something out there that could save lives during our ongoing battles that humans have never even thought about.

Evan’s Summary


From what I can tell, the researchers at CMU had the same problems that early Q-learning did where the action space and state space were too large to handle in a traditional array. Q-learning went towards neural networks eventually but early on they did discretize the input. That is what Pluribus is doing with the actions and information.

For the action state they group the bet sizes (think $105 is the same as $100 or $110) and then during play they use a search algorithm to narrow down the decision.

For the information collected they group similar hands together IN FUTURE ROUNDS since they are played the same. But, in the current round they use the exact hand.

Offline Training (Blueprint)

To build the offline model they used what is common called counterfactual regret (CFR) minimization. Basically, after the hand is over they go back through and decide what “should” have been done and how it affected the outcome. What is new (to me at least) is they used a Monte Carlo simulation to sample actions versus traversing through the entire game. Because they have all the players in the offline game the AI knows all of this information.

Counterfactual Regret

CFR guarantees the overall interactions converge to a nash equilibrium in a zero-sum game.

CFR guarantees in all finite games that all counterfactual regrets grow sublinearly in the number of iterations. This, in turn, guarantees in the limit that the average performance of CFR on each iteration that was played matches the average performance of the best single fixed strategy in hindsight. CFR is also proven to eliminate iteratively strictly dominated actions in all finite games

Superhuman AI for multiplayer poker

Training Time

According to the paper the training was done of 8 day on a 64-core server with 12,4000 CPU core hours. They used less than 512 GB of memory. The assume current cloud rates and said it would cost them about $144 to produce.

Playing Strategy

Because the offline playing is “course” because of the complexity of poker they only use it for the first round where they considered it safe to use. After that, or if the player confuses it by betting oddly, they use real time search.

While real time search has been successful in perfect information games there is a problem in imperfect information games. The paper says it is “fundamentally broken”. There is example is Rock/Paper/Scissors. In my previous blog post I showed that the expected value of each move is 1/3. Using the search algorithm you would think each move is the same and will just pick scissors each time. This would cause an issue as the other player would know this and win every time with rock.

There were 2 alternatives that were used in the past. AI DeepStack determined the leaf nodes value based on the strategy used to get to the leaf. This doesn’t scale with a large tree. The other alternative was used in AI Libratus who would only use the search when they could extend to the end of the game. This wouldn’t work with the addition of extra poker players.

Pluribus instead uses a modified form of an approach that we recently designed—previously only for two-player zero-sum games (41)— in which the searcher explicitly considers that any or all players may shift to different strategies beyond the leaf nodes of a subgame. Specifically, rather than assuming all players play according to a single fixed strategy beyond the leaf nodes (which results in the leaf nodes having a single fixed value) we instead assume that each player may choose between k different strategies, specialized to each player, to play for the remainder of the game when a leaf node is reached.

Superhuman AI for multiplayer poker

They used 4 strategies that were all based on the baseline blueprint strategy. The first was the actual blueprint, the second was the blueprint with a bias towards folding, the third is a bias toward raising, and the fourth is a bias towards calling.


Another issue of the imperfect game is around bluffing. The optimal strategy with the best hand it to play and the worst hand is to fold. If you do this all the time the other players will know what you have. To help solve this, Pluribus determines the probability of reaching the current point with each possible hand. It then creates a balanced strategy and does that action.


Pluribus was tested against elite human players (won at least $1 million playing poker) in 2 formats. The first was 5 v 1 with 5 players and the second was with 5 Pluribus instances.

The measurement was the number of milli big blinds per game. This is the measurement of the number of big blinds won per 1k hands of poker. Pluribus ended up winning 48mbb/game which is considered very high and shows it is “better” than the players it was against.


I hope this helps clear the hype around most articles written about the subject and maybe a little easier to read than the Science article linked above.

I am excited to see if this will lead any poker players to change their game or if this will kill the online poker world.