In this post, I will continue from last time talking about discretization but will use it against an OpenAI environment. Most of this post was picked up after watching my TA work through one of his notebooks on the subject. If you want a good resource check out Miguel’s GitHub.
I am going to start with determining the state space from the OpenAI gym CartPole v0. You can check out the GitHub for the actual source. There are 4 states. The first two are the location and the second two are the angle at which the pole is leaning. Once the pole gets +15 degrees or the cart get more than 2.5 units from the center you lose.
Next, we need to determine their values. Actions are 2 discrete states. Either push (+1) or pull (-1). The state will be the x, x_dot, theta, and theta_dot values. It wouldn’t make sense to have an infinite state space of every combination for all 4 of those so we will need to create our own.
Using a random approach we could run 10k random session against the CartPole problem and then determine their values. My results were (-1.4,1.4), (-3,3), (-0.2,0.2), and (-3.3,3.3). These are outside the acceptable bounds because they will still do the last calculations before setting the ‘done’ flag. But we can get their real values but just calling their observation space. Grabbing the high and low values and dividing them by 2 you get (-2.4,+2.4) and (-0.2,+0.2). These values make sense because a failure is when you are 2.4 away from center and a tilt of 20 degrees.
Last post, I used digitize to break the entries into bins. This time, I will use numpy’s linspace. This will return an even distribution and allow us to limit the number of states.
With some tweaking you should be able to get the states narrowed down enough that your q-learner will be able to solve a simple continuous state environment.
Next time, I will introduce solving this environment with a neural network and q-learning.