## Discretization with OpenAI

For my code, go to my GitHub account and look at Discretization.py.

In this post, I will continue from last time talking about discretization but will use it against an OpenAI environment. Most of this post was picked up after watching my TA work through one of his notebooks on the subject. If you want a good resource check out Miguel’s GitHub.

I am going to start with determining the state space from the OpenAI gym CartPole v0.  You can check out the GitHub for the actual source. There are 4 states. The first two are the location and the second two are the angle at which the pole is leaning. Once the pole gets +15 degrees or the cart get more than 2.5 units from the center you lose.

Next, we need to determine their values. Actions are 2 discrete states. Either push (+1) or pull (-1). The state will be the x, x_dot, theta, and theta_dot values. It wouldn’t make sense to have an infinite state space of every combination for all 4 of those so we will need to create our own.

Using a random approach we could run 10k random session against the CartPole problem and then determine their values. My results were (-1.4,1.4), (-3,3), (-0.2,0.2), and (-3.3,3.3). These are outside the acceptable bounds because they will still do the last calculations before setting the ‘done’ flag. But we can get their real values but just calling their observation space. Grabbing the high and low values and dividing them by 2 you get (-2.4,+2.4) and (-0.2,+0.2). These values make sense because a failure is when you are 2.4 away from center and a tilt of 20 degrees.

Last post, I used digitize to break the entries into bins. This time, I will use numpy’s linspace. This will return an even distribution and allow us to limit the number of states.

With some tweaking you should be able to get the states narrowed down enough that your q-learner will be able to solve a simple continuous state environment.

Next time, I will introduce solving this environment with a neural network and q-learning.

## Discretization

From Wikipedia, discretization is the process of transferring continuous values into discrete states. This can be done because of memory/space requirements that come up with continuous states.

In a simple q-learning environment you would have a grid of X spaces. In the Frozen Lake environment you have a 4×4 grid. So, you would set up your learner with your X actions and their expected values across 16 state spaces. You would then proceed to run the q-learning process (get action, retrieve next state and reward, etc) until you have the optimal policy.

But, when you move to a problem that doesn’t have a discrete state space you will need to discretize it. A simple array sample can be shown with numpy’s digitize call. I will walk through it.

Create your “continuous” state space (image this is huge)

`state_space = np.array([0,1,2,3,4,5,6,7,8,9])`

Create the bins that you can handle (lets assume that you can only store 3 spaces)

`bins = np.array([0,2,7])`

Now, run the digitize to combine all the numbers in state_space into their new discretized space

```dis = np.digitize(state_space,bins)

dis = array([1, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype=int64)```

This print out shows that numbers 0,1 are in new space 0. 2,3,4,5,6 are in space 1 and 7,8,9 are in space 2.

You can go into more depth if you don’t know the numbers. This is what I had to do with the Machine Learning for Trading grad class when it revolved around technical indicators. In that case, we took it a step further by using the digitized values from 4 indicators and then concatenated them. For example, I used an indicator determining if it was + or – from the previous day. If it was + I would give it a 1. If it was negative I would give it a 0. I would then do the same thing with if it was above an simple moving average. That would lead me to have a positive in the first indicator and a positive in the second indicator returning 11 as my state. If it was + and then -, it would have been 10. And so on.

Discretization allows you to put massive state spaces into something you can store.

## Installing OpenAI Environments on Windows 10

I had heard about OpenAI before actually using it from reading about general ML and AI topics. Mostly, because it was founded by Elon Musk. During my most recent grad class I was forced to use it to solve a project. This was one of the more frustrating assignments I have ever had. But, like most things like this, the most rewarding when I finally solved it.

The first thing to come up was that OpenAI environments (gyms) are not supported on Windows. Well, since I spend my day on Windows working in Visual Studio I decided to try and force my way through and get everything running. For the most part it worked, there are some big limitations but you can still interact with the environment. Of course, I could have created a Docker image and run in that but what fun would that be?

Here are my trails and tribulations

Installing Visual Studio 2017 with the Python Plugin:
This is a simple install/modify of Visual Studio. [Link]

Installing Anaconda:
This is also pretty straight forward. I had to download Anaconda 4.4 [Link] for Python 3.6.

Installing the Gyms:
This is the part where things get hairy. Since most of these items are NOT designed or supported in Windows. Because of this, you have to manually install some things and accept others just won’t work.

From the Anaconda Prompt:

```git clone https://github.com/openai/gym
cd gym
pip install -e .
pip install gym
pip install gym[all]```

This will download all of the gyms but you WILL see errors. The biggest is that you can’t run ‘make’ on Windows for each of the Atari gyms. This is something I want to try and figure out.

Swig:
You will need to download swig.exe [Link] and add it to your environment variables to build the Box2D items (I had to solve the Lunar Lander).

Once these are done, you can run a test application to ensure things are working.

```#***************
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
#***************```

Box2D:
Box2D doesn’t work for Windows so I went to lfd.uci.edu/~gohlke [Link] to download the wheel file. I then took that wheel file and installed it using ‘pip install’

Some Knows Issues:
ffmpeg [Download] wouldn’t install for me even though I could run it from the command line. This kept me from recording the videos that were posted to OpenAI. It wasn’t a huge deal but was a pain. In order to make the ‘wrapper’ class work so that I could at least upload the results I had to set video_callable to false.

`wrappers.Monitor(env,tempdir,force=True,video_callable=False)`

Atari gyms won’t work without the ‘make’ commands. There are active discussions on the GitHub repo for OpenAI about getting them to work.