I Graduated!!!

After 3 years and 10 classes (plus 1 I dropped when the baby was born) I finally graduated. I now have a master’s degree in computer science with a specialization in machine learning.

Where to now? I am not sure. I would really like to get working with Amazon DeepRacer. I also want to get involved with Google Development Experts program with TensorFlow. So, my learning is just beginning.

In the near future I will post a few things from school that I have completed. I just created a project that does an introduction to reinforcement learning that I used for my Iowa Code Camp talk.

Ideally, after spending some time with my family I would like to keep this blog up to date with what I am working with.

Until next time!


Machine Learning

Since I am starting my next Master’s class today and I am still struggling with creating the Android applications I am going to just put up a short summary post.

School: I am starting 7641 Machine Learning. It has the same teachers that my summer class did and actually the last part of the class is a high level of the same topics. We will see if it was as hard as RL.

Blogs: Here are some blogs and other sort of things that I try and stay on top of:

Until life or a breakthrough in the Android app I will leave this blog as is.

Image Recognition w/ Tensor Flow

I am going to work through an easy example to get image recognition up and running using TensorFlow and an example from Google’s CodeLab. The idea is that you take all of the training from Google and then insert your images into the last layer of the NN. Unless you have 100M+ images to train on this is as good as you are going to get.

To begin, I am running Windows 10 with Visual Studio 2017 using Anaconda 4.4.0.

To install TensorFlow on Windows start here.

Next, since I am going to northern Minnesota this week to go fishing I decided to change the flowers to pictures of fish. First, I created a folder called ‘fish’ in the root directory of my code. Under that I created “Muskellunge”, “Northern Pike”, and “Walleye” and filled them with pictures of those fish from Google.

Download the retrain scripts from here:

curl -O https://raw.githubusercontent.com/tensorflow/tensorflow/r1.1/tensorflow/examples/image_retraining/retrain.py

Start TensorBoard to see some pretty graphics:

python -m tensorflow.tensorboard --logdir training_summaries

Run the retrain script:

python retrain.py --bottleneck_dir=bottlenecks --how_many_training_steps=500 --model_dir=inception --summaries_dir=training_summariesbasics --output_graph=retrained_graph.pb --output_labels=retrained_labels.txt --image_dir=fish

This will take some time to run so go grab a drink.

Once this is done you will see “retrained_graph.pb” and “retrained_labels.txt” in the root directory. They were supposed to be in the folder ‘fish’ and I am not sure why there are not there. But, it won’t matter.

Create the ‘ImageRecon.py’ file that will use the pb and your test image to see if your code works. I am using a picture of my brother from a few years ago holding a smaller muskie, a picture of myself holding a northern, a picture of my dad holding a walleye, a random picture of a bass I caught, and an article about the Lumineers. When I run the code it returns great results for the muskie, solid results for the walleye, pretty good results for the article since it can’t determine which type of fish it is and expected results with the bass since it just sees a fish. But, the northern pike was terrible. This “sort of” makes sense because people confuse them in the wild all the time.

Testing: muskie.jpg
        muskellunge (score = 0.99796)
        northern pike (score = 0.00155)
        walleye (score = 0.00049)
Testing: northern.jpg
        muskellunge (score = 0.99884)
        walleye (score = 0.00113)
        northern pike (score = 0.00002)
Testing: article.jpg
        walleye (score = 0.66850)
        muskellunge (score = 0.28944)
        northern pike (score = 0.04206)
Testing: bass.jpg
        walleye (score = 0.99690)
        northern pike (score = 0.00245)
        muskellunge (score = 0.00065)
Testing: walleye.jpg
        walleye (score = 0.86294)
        northern pike (score = 0.13007)
        muskellunge (score = 0.00700)

After some testing I decided that the picture I was using wasn’t clearly showing the stripes on the northern and the NN was processing me instead of the fish. To resolve, I changed the picture to something more clear and zoomed in.

Testing: muskie.jpg
        muskellunge (score = 0.99784)
        northern pike (score = 0.00171)
        walleye (score = 0.00045)
Testing: northern.jpg
        northern pike (score = 0.97309)
        muskellunge (score = 0.01992)
        walleye (score = 0.00700)
Testing: article.jpg
        walleye (score = 0.70012)
        muskellunge (score = 0.25732)
        northern pike (score = 0.04255)
Testing: bass.jpg
        walleye (score = 0.99765)
        northern pike (score = 0.00187)
        muskellunge (score = 0.00048)
Testing: walleye.jpg
        walleye (score = 0.85972)
        northern pike (score = 0.13590)
        muskellunge (score = 0.00438)

Much better!

My next step is to figure out a way to use this in real life. At the very least, figure out a way to get this on my phone so I can take random pictures. I know that Silicon Valley had an app similar called ‘Hot Dog or Not Hot Dog’. There is a write up about it that sounds like a pretty cool program to make since they were able to compile it down to work on the phone without any backend cloud. If all else fails, I can run another CodeLab which explains how to put it in Android.

Full Source: GitHub

Because this process generates some massive files I have ignored the large ones that are generated by this process

Rock, Paper, Scissors in Linear Programming

Today I want to work through some stuff I struggled with in my current class. I haven’t done linear programming before in my life (college was a long time ago). Well, that changed a few weeks ago while trying to replicate a graph for Foe-Q and a graph for Correlated Equilibrium (CE-Q) of a zero sum game. In order to work my way through the assignment I started small and used the game of Rock, Paper, Scissors. The “trick” is adding the ‘U’ variable that you will minimize.

For the solver, I went with cvxopt and glpk since it was an order of magnitude faster than anything else. But, since I was on Windows it was a pain to get installed and I didn’t get nearly the speed up that Linux users got.

from cvxopt import matrix, solvers
glpksolver = ‘cvxopt_glpk’

For the ‘c’ matrix we need to set the first column to -1 since we need to minimize the U variable to get the correct value.

#Minimize: U + R + P + S
c = matrix([-1.,0.,0.,0.])

For the G matrix we need to build our equations. The first 3 set up the results. The first row is throwing Rock where you lose (-1) to Paper but win (+1) to Scissors. Row 2 is Paper’s results and row 3 is Scissors.

#1. U – P + S <= 0 => [1, 0,-1, 1]
#2. U + R – S <= 0 => [1, 1, 0,-1]
#3. U – R + P <= 0 => [1,-1, 1, 0]

The second set of 3 rows is to ensure you can only throw a single play. Not, since we are using cvxopt we need to negate these. So, instead of R greater than or equal to 0 we need -R less than or equal to 0.

#4. -R <= 0 => [0,-1, 0, 0]
#5. -P <= 0 => [0, 0,-1, 0]
#6. -S <= 0 => [0, 0, 0,-1]

The last set are the equality parameters. This ensure that we get a total 1 when the game is over. You can’t throw more than 1 play.

#7. R + P + S <= 1 => [0, 1, 1, 1]
#8. -R – P – S <=-1 => [0,-1,-1,-1]

The h matrix is the results of the previous 8 equations.

h = matrix([0.,0.,0.,0.,0.,0.,1.,-1.])

Here is the code:

from cvxopt import matrix, solvers
glpksolver = ‘cvxopt_glpk’
#Minimize: U + R + P + S
c = matrix([-1.,0.,0.,0.])
#1. U – P + S <= 0 => [1, 0,-1, 1]
#2. U + R – S <= 0 => [1, 1, 0,-1]
#3. U – R + P <= 0 => [1,-1, 1, 0]
#4. -R <= 0 => [0,-1, 0, 0]
#5. -P <= 0 => [0, 0,-1, 0]
#6. -S <= 0 => [0, 0, 0,-1]
#7. R + P + S <= 1 => [0, 1, 1, 1]
#8. -R – P – S <=-1 => [0,-1,-1,-1]
G = matrix([[1.,1.,1.,0.,0.,0.,0.,0.],[0.,1.,-1.,-1.,0.,0.,1.,-1],[-1.,0.,1.,0.,-1.,0.,1.,-1.],[1.,-1.,0.,0.,0.,-1.,1.,-1.]])
h = matrix([0.,0.,0.,0.,0.,0.,1.,-1.])
sol = solvers.lp(c,G,h, solver=glpksolver)

To wrap this up, I was able to get the Foe-q graph replicated by setting up the linear equation to solve for minimax. But, CE-Q defeated me.

K-Armed Bandit

In this post I am going to walk through the k-armed bandit. I will be using Professor Sutton’s book “Reinforcement Learning: An Introduction”. A link to an updated version is on his web site located here. I will be walking through coding up the figures in Chapter 2. I will be doing the normal k-armed bandit in section 2.1 and 2.2, the Softmax in section 2.3, and finally, the Optimistic Initial Values in section 2.7. As always, I will post my code in my GitHub repository.

K-Armed Bandit:
First, lets talk about the k-armed bandit problem. According to wikipedia, it is a problem in which a gambler at a row of slot machines has to decide which lever to pull to maximize his winnings. Each machine returns a random reward from a probability distribution specific to that machine. The goal is to maximize your reward through a sequence of lever pulls.

In practice, this model could be used to manage research project, or any R&D department, when the rewards are not known to determine where to allocate resources. Does this ever happen? Probably not. But, it is a cool theory and I can code it!

Action-Value (figure 2.1):
In this figure, Professor Sutton is using the epsilon greedy method, which is where you explore epsilon percent and exploit the rest. He sets up 2,000 randomly generated k-armed bandit tasks with 10 arms. The rewards are returned from a normal (Gaussian) probability distribution (numpy has a method for this, thankfully) with mean of Q*(a) and variance 1. Q* was generated as a normal distribution with mean 0 and variance 1. Lets get started.

First, creating Q* from the normal distribution of mean 0 and variance of 1. We will also need to create arrays for the number of times each arm was selected and the estimate values

self.q_star = np.random.normal(0,math.sqrt(1),arms)
self.K = np.zeros(arms)
self.estimate_values = np.zeros(arms)

Second, we need to create a function that handles the arm selection. If we have epsilon of 0.1 that means 10% of the time we are exploring the space and the other 90% we grab the best value

def get_arm(self):
#Randomly determine if we are going to explore or exploit
if np.random.random() > self.epsilon:
return np.argmax(self.estimate_values)
return np.random.randint(0,self.arms)

Third, we need to create a function that will generate our rewards. From the book it says that we need to take the normal probability of the mean of Q* and variance of 1.

def get_reward(self, arm):
#Normal (Guassian) probabilty of mean Q* and variance 1
reward = np.random.normal(self.q_star[arm],math.sqrt(1))
return reward

Fourth, we need to create a method that will update our local estimate array with the current rewards.

def update_estimate_values(self, arm, reward):
self.K[arm] += 1
k = self.K[arm]
#Set new reward (r1 + r2 … ra)/ka
current_total = self.estimate_values[arm]
#(k-1/k)* current_total + (1/k) * reward
self.estimate_values[arm] = ((k-1) / float(k)) * current_total + (1/float(k)) * reward

That completes our bandit code. I have uploaded this all to my GitHub account for you to see them all in action.

Here is a link to the Rewards graph
Here is a link to the Optimal Selection graph

Softmax Variation:
This is a variation on how you run the get action method. You would add the following code. Knowing that as temperature goes to 0 it becomes more greed and as temperature goes to 1 you do more exploring

def calc_softmax(self):
e = np.exp(self.est_values / self.temperature)
dist = e/np.sum(e, axis=0)
return dist

def choose_eps_greedy(self):
sf = self.calc_softmax()
action = np.random.choice(self.arms,1,p=sf)[0]
return action

instead of


Optimistic Initial Values:
Optimistic initial values is exactly what it sounds like. You set the initial values of your estimate array to a value instead of 0. You will need some domain knowledge for this to help.

def __init__(self, arms, epsilon, initial):
super().__init__(arms, epsilon)
self.estimate_values[:] = initial #Set initial value
self.step_size = 0.1 #Set alpha to 0.1

I also added in some update code using step size that was introduced before this figure.

def update_est(self, arm, k, current_total, reward):
self.estimate_values[arm] += self.step_size * (reward – current_total)

Here is a link to this graph.

Discretization with OpenAI

For my code, go to my GitHub account and look at Discretization.py.

In this post, I will continue from last time talking about discretization but will use it against an OpenAI environment. Most of this post was picked up after watching my TA work through one of his notebooks on the subject. If you want a good resource check out Miguel’s GitHub.

I am going to start with determining the state space from the OpenAI gym CartPole v0.  You can check out the GitHub for the actual source. There are 4 states. The first two are the location and the second two are the angle at which the pole is leaning. Once the pole gets +15 degrees or the cart get more than 2.5 units from the center you lose.

Next, we need to determine their values. Actions are 2 discrete states. Either push (+1) or pull (-1). The state will be the x, x_dot, theta, and theta_dot values. It wouldn’t make sense to have an infinite state space of every combination for all 4 of those so we will need to create our own.

Using a random approach we could run 10k random session against the CartPole problem and then determine their values. My results were (-1.4,1.4), (-3,3), (-0.2,0.2), and (-3.3,3.3). These are outside the acceptable bounds because they will still do the last calculations before setting the ‘done’ flag. But we can get their real values but just calling their observation space. Grabbing the high and low values and dividing them by 2 you get (-2.4,+2.4) and (-0.2,+0.2). These values make sense because a failure is when you are 2.4 away from center and a tilt of 20 degrees.

Last post, I used digitize to break the entries into bins. This time, I will use numpy’s linspace. This will return an even distribution and allow us to limit the number of states.

With some tweaking you should be able to get the states narrowed down enough that your q-learner will be able to solve a simple continuous state environment.

Next time, I will introduce solving this environment with a neural network and q-learning.


From Wikipedia, discretization is the process of transferring continuous values into discrete states. This can be done because of memory/space requirements that come up with continuous states.

In a simple q-learning environment you would have a grid of X spaces. In the Frozen Lake environment you have a 4×4 grid. So, you would set up your learner with your X actions and their expected values across 16 state spaces. You would then proceed to run the q-learning process (get action, retrieve next state and reward, etc) until you have the optimal policy.

But, when you move to a problem that doesn’t have a discrete state space you will need to discretize it. A simple array sample can be shown with numpy’s digitize call. I will walk through it.

Create your “continuous” state space (image this is huge)

state_space = np.array([0,1,2,3,4,5,6,7,8,9])

Create the bins that you can handle (lets assume that you can only store 3 spaces)

bins = np.array([0,2,7])

Now, run the digitize to combine all the numbers in state_space into their new discretized space

dis = np.digitize(state_space,bins)

dis = array([1, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype=int64)

This print out shows that numbers 0,1 are in new space 0. 2,3,4,5,6 are in space 1 and 7,8,9 are in space 2.

You can go into more depth if you don’t know the numbers. This is what I had to do with the Machine Learning for Trading grad class when it revolved around technical indicators. In that case, we took it a step further by using the digitized values from 4 indicators and then concatenated them. For example, I used an indicator determining if it was + or – from the previous day. If it was + I would give it a 1. If it was negative I would give it a 0. I would then do the same thing with if it was above an simple moving average. That would lead me to have a positive in the first indicator and a positive in the second indicator returning 11 as my state. If it was + and then -, it would have been 10. And so on.

Discretization allows you to put massive state spaces into something you can store.