Machine Learning

Since I am starting my next Master’s class today and I am still struggling with creating the Android applications I am going to just put up a short summary post.

School: I am starting 7641 Machine Learning. It has the same teachers that my summer class did and actually the last part of the class is a high level of the same topics. We will see if it was as hard as RL.

Blogs: Here are some blogs and other sort of things that I try and stay on top of:

Until life or a breakthrough in the Android app I will leave this blog as is.

Image Recognition w/ Tensor Flow

I am going to work through an easy example to get image recognition up and running using TensorFlow and an example from Google’s CodeLab. The idea is that you take all of the training from Google and then insert your images into the last layer of the NN. Unless you have 100M+ images to train on this is as good as you are going to get.

To begin, I am running Windows 10 with Visual Studio 2017 using Anaconda 4.4.0.

To install TensorFlow on Windows start here.

Next, since I am going to northern Minnesota this week to go fishing I decided to change the flowers to pictures of fish. First, I created a folder called ‘fish’ in the root directory of my code. Under that I created “Muskellunge”, “Northern Pike”, and “Walleye” and filled them with pictures of those fish from Google.

Download the retrain scripts from here:

curl -O https://raw.githubusercontent.com/tensorflow/tensorflow/r1.1/tensorflow/examples/image_retraining/retrain.py

Start TensorBoard to see some pretty graphics:

python -m tensorflow.tensorboard --logdir training_summaries

Run the retrain script:

python retrain.py --bottleneck_dir=bottlenecks --how_many_training_steps=500 --model_dir=inception --summaries_dir=training_summariesbasics --output_graph=retrained_graph.pb --output_labels=retrained_labels.txt --image_dir=fish

This will take some time to run so go grab a drink.

Once this is done you will see “retrained_graph.pb” and “retrained_labels.txt” in the root directory. They were supposed to be in the folder ‘fish’ and I am not sure why there are not there. But, it won’t matter.

Create the ‘ImageRecon.py’ file that will use the pb and your test image to see if your code works. I am using a picture of my brother from a few years ago holding a smaller muskie, a picture of myself holding a northern, a picture of my dad holding a walleye, a random picture of a bass I caught, and an article about the Lumineers. When I run the code it returns great results for the muskie, solid results for the walleye, pretty good results for the article since it can’t determine which type of fish it is and expected results with the bass since it just sees a fish. But, the northern pike was terrible. This “sort of” makes sense because people confuse them in the wild all the time.

Testing: muskie.jpg
        muskellunge (score = 0.99796)
        northern pike (score = 0.00155)
        walleye (score = 0.00049)
Testing: northern.jpg
        muskellunge (score = 0.99884)
        walleye (score = 0.00113)
        northern pike (score = 0.00002)
Testing: article.jpg
        walleye (score = 0.66850)
        muskellunge (score = 0.28944)
        northern pike (score = 0.04206)
Testing: bass.jpg
        walleye (score = 0.99690)
        northern pike (score = 0.00245)
        muskellunge (score = 0.00065)
Testing: walleye.jpg
        walleye (score = 0.86294)
        northern pike (score = 0.13007)
        muskellunge (score = 0.00700)

After some testing I decided that the picture I was using wasn’t clearly showing the stripes on the northern and the NN was processing me instead of the fish. To resolve, I changed the picture to something more clear and zoomed in.

Testing: muskie.jpg
        muskellunge (score = 0.99784)
        northern pike (score = 0.00171)
        walleye (score = 0.00045)
Testing: northern.jpg
        northern pike (score = 0.97309)
        muskellunge (score = 0.01992)
        walleye (score = 0.00700)
Testing: article.jpg
        walleye (score = 0.70012)
        muskellunge (score = 0.25732)
        northern pike (score = 0.04255)
Testing: bass.jpg
        walleye (score = 0.99765)
        northern pike (score = 0.00187)
        muskellunge (score = 0.00048)
Testing: walleye.jpg
        walleye (score = 0.85972)
        northern pike (score = 0.13590)
        muskellunge (score = 0.00438)

Much better!

My next step is to figure out a way to use this in real life. At the very least, figure out a way to get this on my phone so I can take random pictures. I know that Silicon Valley had an app similar called ‘Hot Dog or Not Hot Dog’. There is a write up about it that sounds like a pretty cool program to make since they were able to compile it down to work on the phone without any backend cloud. If all else fails, I can run another CodeLab which explains how to put it in Android.

Full Source: GitHub

**NOTE**
Because this process generates some massive files I have ignored the large ones that are generated by this process
/ImageRecon/ImageRecon/tf_files
/ImageRecon/ImageRecon/retrained_graph.pb
/ImageRecon/ImageRecon/inception/inception-2015-12-05.tgz
/ImageRecon/ImageRecon/inception/classify_image_graph_def.pb
/ImageRecon/ImageRecon/retrained_graph.pb
/ImageRecon/ImageRecon/training_summaries/basics/train/events.out.tfevents.1501607839.DESKTOP
/ImageRecon/ImageRecon/training_summaries/basics/train/events.out.tfevents.1501552333.DESKTOP

Rock, Paper, Scissors in Linear Programming

Today I want to work through some stuff I struggled with in my current class. I haven’t done linear programming before in my life (college was a long time ago). Well, that changed a few weeks ago while trying to replicate a graph for Foe-Q and a graph for Correlated Equilibrium (CE-Q) of a zero sum game. In order to work my way through the assignment I started small and used the game of Rock, Paper, Scissors. The “trick” is adding the ‘U’ variable that you will minimize.

For the solver, I went with cvxopt and glpk since it was an order of magnitude faster than anything else. But, since I was on Windows it was a pain to get installed and I didn’t get nearly the speed up that Linux users got.

[python]
from cvxopt import matrix, solvers
glpksolver = ‘cvxopt_glpk’
[/python]

For the ‘c’ matrix we need to set the first column to -1 since we need to minimize the U variable to get the correct value.

[python]
#Minimize: U + R + P + S
c = matrix([-1.,0.,0.,0.])
[/python]

For the G matrix we need to build our equations. The first 3 set up the results. The first row is throwing Rock where you lose (-1) to Paper but win (+1) to Scissors. Row 2 is Paper’s results and row 3 is Scissors.

[python]
#1. U – P + S <= 0 => [1, 0,-1, 1]
#2. U + R – S <= 0 => [1, 1, 0,-1]
#3. U – R + P <= 0 => [1,-1, 1, 0]
[/python]

The second set of 3 rows is to ensure you can only throw a single play. Not, since we are using cvxopt we need to negate these. So, instead of R greater than or equal to 0 we need -R less than or equal to 0.

[python]
#4. -R <= 0 => [0,-1, 0, 0]
#5. -P <= 0 => [0, 0,-1, 0]
#6. -S <= 0 => [0, 0, 0,-1]
[/python]

The last set are the equality parameters. This ensure that we get a total 1 when the game is over. You can’t throw more than 1 play.

[python]
#7. R + P + S <= 1 => [0, 1, 1, 1]
#8. -R – P – S <=-1 => [0,-1,-1,-1]
[/python]

The h matrix is the results of the previous 8 equations.

[python]
h = matrix([0.,0.,0.,0.,0.,0.,1.,-1.])
[/python]

Here is the code:

[python]
from cvxopt import matrix, solvers
glpksolver = ‘cvxopt_glpk’
#Minimize: U + R + P + S
c = matrix([-1.,0.,0.,0.])
#1. U – P + S <= 0 => [1, 0,-1, 1]
#2. U + R – S <= 0 => [1, 1, 0,-1]
#3. U – R + P <= 0 => [1,-1, 1, 0]
#4. -R <= 0 => [0,-1, 0, 0]
#5. -P <= 0 => [0, 0,-1, 0]
#6. -S <= 0 => [0, 0, 0,-1]
#7. R + P + S <= 1 => [0, 1, 1, 1]
#8. -R – P – S <=-1 => [0,-1,-1,-1]
G = matrix([[1.,1.,1.,0.,0.,0.,0.,0.],[0.,1.,-1.,-1.,0.,0.,1.,-1],[-1.,0.,1.,0.,-1.,0.,1.,-1.],[1.,-1.,0.,0.,0.,-1.,1.,-1.]])
h = matrix([0.,0.,0.,0.,0.,0.,1.,-1.])
sol = solvers.lp(c,G,h, solver=glpksolver)
print(sol[‘status’])
print(sol[‘x’])
[/python]

To wrap this up, I was able to get the Foe-q graph replicated by setting up the linear equation to solve for minimax. But, CE-Q defeated me.

K-Armed Bandit

In this post I am going to walk through the k-armed bandit. I will be using Professor Sutton’s book “Reinforcement Learning: An Introduction”. A link to an updated version is on his web site located here. I will be walking through coding up the figures in Chapter 2. I will be doing the normal k-armed bandit in section 2.1 and 2.2, the Softmax in section 2.3, and finally, the Optimistic Initial Values in section 2.7. As always, I will post my code in my GitHub repository.

K-Armed Bandit:
First, lets talk about the k-armed bandit problem. According to wikipedia, it is a problem in which a gambler at a row of slot machines has to decide which lever to pull to maximize his winnings. Each machine returns a random reward from a probability distribution specific to that machine. The goal is to maximize your reward through a sequence of lever pulls.

In practice, this model could be used to manage research project, or any R&D department, when the rewards are not known to determine where to allocate resources. Does this ever happen? Probably not. But, it is a cool theory and I can code it!

Action-Value (figure 2.1):
In this figure, Professor Sutton is using the epsilon greedy method, which is where you explore epsilon percent and exploit the rest. He sets up 2,000 randomly generated k-armed bandit tasks with 10 arms. The rewards are returned from a normal (Gaussian) probability distribution (numpy has a method for this, thankfully) with mean of Q*(a) and variance 1. Q* was generated as a normal distribution with mean 0 and variance 1. Lets get started.

First, creating Q* from the normal distribution of mean 0 and variance of 1. We will also need to create arrays for the number of times each arm was selected and the estimate values

[python]
self.q_star = np.random.normal(0,math.sqrt(1),arms)
self.K = np.zeros(arms)
self.estimate_values = np.zeros(arms)
[/python]

Second, we need to create a function that handles the arm selection. If we have epsilon of 0.1 that means 10% of the time we are exploring the space and the other 90% we grab the best value

[python]
def get_arm(self):
#Randomly determine if we are going to explore or exploit
if np.random.random() > self.epsilon:
#Exploit
return np.argmax(self.estimate_values)
else:
#Explore
return np.random.randint(0,self.arms)
[/python]

Third, we need to create a function that will generate our rewards. From the book it says that we need to take the normal probability of the mean of Q* and variance of 1.

[python]
def get_reward(self, arm):
#Normal (Guassian) probabilty of mean Q* and variance 1
reward = np.random.normal(self.q_star[arm],math.sqrt(1))
return reward
[/python]

Fourth, we need to create a method that will update our local estimate array with the current rewards.

[python]
def update_estimate_values(self, arm, reward):
self.K[arm] += 1
k = self.K[arm]
#Set new reward (r1 + r2 … ra)/ka
current_total = self.estimate_values[arm]
#(k-1/k)* current_total + (1/k) * reward
self.estimate_values[arm] = ((k-1) / float(k)) * current_total + (1/float(k)) * reward
[/python]

That completes our bandit code. I have uploaded this all to my GitHub account for you to see them all in action.

Here is a link to the Rewards graph
Here is a link to the Optimal Selection graph

Softmax Variation:
This is a variation on how you run the get action method. You would add the following code. Knowing that as temperature goes to 0 it becomes more greed and as temperature goes to 1 you do more exploring

[python]
def calc_softmax(self):
e = np.exp(self.est_values / self.temperature)
dist = e/np.sum(e, axis=0)
return dist

def choose_eps_greedy(self):
sf = self.calc_softmax()
action = np.random.choice(self.arms,1,p=sf)[0]
return action
[/python]

instead of

[python]
np.argmax(self.estimate_values)
[/python]

Optimistic Initial Values:
Optimistic initial values is exactly what it sounds like. You set the initial values of your estimate array to a value instead of 0. You will need some domain knowledge for this to help.

[python]
def __init__(self, arms, epsilon, initial):
super().__init__(arms, epsilon)
self.estimate_values[:] = initial #Set initial value
self.step_size = 0.1 #Set alpha to 0.1
[/python]

I also added in some update code using step size that was introduced before this figure.

[python]
def update_est(self, arm, k, current_total, reward):
self.estimate_values[arm] += self.step_size * (reward – current_total)
[/python]

Here is a link to this graph.