Discretization with OpenAI

For my code, go to my GitHub account and look at Discretization.py.

In this post, I will continue from last time talking about discretization but will use it against an OpenAI environment. Most of this post was picked up after watching my TA work through one of his notebooks on the subject. If you want a good resource check out Miguel’s GitHub.

I am going to start with determining the state space from the OpenAI gym CartPole v0.  You can check out the GitHub for the actual source. There are 4 states. The first two are the location and the second two are the angle at which the pole is leaning. Once the pole gets +15 degrees or the cart get more than 2.5 units from the center you lose.

Next, we need to determine their values. Actions are 2 discrete states. Either push (+1) or pull (-1). The state will be the x, x_dot, theta, and theta_dot values. It wouldn’t make sense to have an infinite state space of every combination for all 4 of those so we will need to create our own.

Using a random approach we could run 10k random session against the CartPole problem and then determine their values. My results were (-1.4,1.4), (-3,3), (-0.2,0.2), and (-3.3,3.3). These are outside the acceptable bounds because they will still do the last calculations before setting the ‘done’ flag. But we can get their real values but just calling their observation space. Grabbing the high and low values and dividing them by 2 you get (-2.4,+2.4) and (-0.2,+0.2). These values make sense because a failure is when you are 2.4 away from center and a tilt of 20 degrees.

Last post, I used digitize to break the entries into bins. This time, I will use numpy’s linspace. This will return an even distribution and allow us to limit the number of states.

With some tweaking you should be able to get the states narrowed down enough that your q-learner will be able to solve a simple continuous state environment.

Next time, I will introduce solving this environment with a neural network and q-learning.


From Wikipedia, discretization is the process of transferring continuous values into discrete states. This can be done because of memory/space requirements that come up with continuous states.

In a simple q-learning environment you would have a grid of X spaces. In the Frozen Lake environment you have a 4×4 grid. So, you would set up your learner with your X actions and their expected values across 16 state spaces. You would then proceed to run the q-learning process (get action, retrieve next state and reward, etc) until you have the optimal policy.

But, when you move to a problem that doesn’t have a discrete state space you will need to discretize it. A simple array sample can be shown with numpy’s digitize call. I will walk through it.

Create your “continuous” state space (image this is huge)

state_space = np.array([0,1,2,3,4,5,6,7,8,9])

Create the bins that you can handle (lets assume that you can only store 3 spaces)

bins = np.array([0,2,7])

Now, run the digitize to combine all the numbers in state_space into their new discretized space

dis = np.digitize(state_space,bins)

dis = array([1, 1, 2, 2, 2, 2, 2, 3, 3, 3], dtype=int64)

This print out shows that numbers 0,1 are in new space 0. 2,3,4,5,6 are in space 1 and 7,8,9 are in space 2.

You can go into more depth if you don’t know the numbers. This is what I had to do with the Machine Learning for Trading grad class when it revolved around technical indicators. In that case, we took it a step further by using the digitized values from 4 indicators and then concatenated them. For example, I used an indicator determining if it was + or – from the previous day. If it was + I would give it a 1. If it was negative I would give it a 0. I would then do the same thing with if it was above an simple moving average. That would lead me to have a positive in the first indicator and a positive in the second indicator returning 11 as my state. If it was + and then -, it would have been 10. And so on.

Discretization allows you to put massive state spaces into something you can store.