Information Gain in Decision Trees

One of the ways to determine HOW to break apart a decision tree is by using the Information Gain on that column. The idea is that it tells you how much you gain by splitting on that column. For example, if you split and ALL of the values in the left child have the same value in the Target column you don’t gain anything and your value is 0. If they are split in the middle you get a gain of 1.

The equation is:

$IG(A) = Entrophy(Target-or-Prior) - Remainder(A)$
This breaks down to:
$Entropy(Target-or-Prior) = I(P_1,P_2,..P_n) = \sum_{n=1}^{n} -p_i log_2 p_i$
$Remainder(A) = \sum_{i=1}^{v}\frac{a_i + b_i}{a+b} I(\frac{a_i}{a_i + b_i};\frac{b_i}{a_i + b_i})$

Now that we have the ugly formula out of the way lets put in some actual numbers to determine how we want to split on this tree

 Sample Clouds Sun Rain 1 Yes Yes No 2 Yes Yes No 3 No Yes No 4 No No Yes 5 Yes Yes Yes 6 No No No 7 No No Yes 8 Yes Yes Yes 9 No Yes No 10 Yes No No

In this case ‘Rain’ is our Target column. We have 6 no and 4 yes values. Our first step is to find the Entropy of Rain. We then need to find the Remainder of Clouds and Sun. Finally, we find the Information Gain and use that to break out tree.

Entropy of Rain

If we remember $I(P_1,P_2) = \sum_{i=1}^{n} -p_i log_2 p_i$

We can now put in the actual numbers:

$I(\frac{6}{10},\frac{4}{10}) = - \frac{6}{10} log_2 \frac{6}{10} + - \frac{4}{10} log_2 \frac{4}{10}$

$I(\frac{6}{10},\frac{4}{10}) = -0.6 * -0.7 + -0.4 * -1.3$

$I(\frac{6}{10},\frac{4}{10}) = 0.42 + 0.52$

$I(\frac{6}{10},\frac{4}{10}) = 0.94$

We know that the Entropy of Rain is 0.94.

On a side note, $I(0,1) = 0$ AND $I(\frac{1}{2},\frac{1}{2}) = 1$.

Remainder of Cloud and Sun

If we remember the formula is $Remainder(A) = \sum_{i=1}^{v}\frac{a_i + b_i}{a+b} I(\frac{a_i}{a_i + b_i};\frac{b_i}{a_i + b_i})$

Let get some actual numbers and break down the formula.

 Cloud No Yes No 3 3 Yes 2 2
 Sun No Yes No 2 4 Yes 2 2

For the remainder we need to calculate the number of Yes/No for each of the Yes/No from Rain. These are the values in the grids above.

$Remainder(A) = \sum_{i=1}^{v}\frac{a_i + b_i}{a+b} I(\frac{a_i}{a_i + b_i},\frac{b_i}{a_i + b_i})$

$Remainder(Cloud) = \frac{6}{10}I(\frac{3}{6},\frac{3}{6})+\frac{4}{10}I(\frac{2}{4},\frac{2}{4})$

WHERE $\frac{6}{10}$ comes from the number of No’s total. $\frac{3}{6}$ comes from the number of No’s that are ALSO No’s in the Rain column. The second $\frac{3}{6}$ from the number of Yes’s that are ALSO No’s in the Rain column.

$\frac{4}{10}$ comes from the number of Yes’s total. $\frac{2}{4}$ comes from the number of No’s that are ALSO Yes’s in the Rain column. The second $\frac{2}{4}$ from the number of Yes’s that are ALSO Yes’s in the Rain column.

$Remainder(Cloud) = \frac{6}{10}(-\frac{3}{6} log_2 \frac{3}{6} + -\frac{3}{6} log_2 \frac{3}{6} )+\frac{4}{10}I(-\frac{2}{4} log_2 \frac{2}{4} + -\frac{2}{4} log_2 \frac{2}{4} )$

$Remainder(Cloud) = \frac{6}{10}(-.5 * -1 + -.5 * -1)+\frac{4}{10}I( -.5 * -1 + -.5 * -1 )$

$Remainder(Cloud) = \frac{6}{10}+\frac{4}{10}$

$Remainder(Cloud) = 1$

Now, lets do Sun

$Remainder(Sun) = \frac{6}{10}I(\frac{2}{6},\frac{4}{6})+\frac{4}{10}I(\frac{2}{4},\frac{2}{4})$

WHERE $\frac{6}{10}$ comes from the number of No’s total. $\frac{2}{6}$ comes from the number of No’s that are ALSO No’s in the Rain column. $\frac{4}{6}$ from the number of Yes’s that are ALSO No’s in the Rain column.

$\frac{4}{10}$ comes from the number of Yes’s total. $\frac{2}{4}$ comes from the number of No’s that are ALSO Yes’s in the Rain column. The second $\frac{2}{4}$ from the number of Yes’s that are ALSO Yes’s in the Rain column.

$Remainder(Sun) = \frac{6}{10}(-\frac{2}{6} log_2 \frac{2}{6} + -\frac{4}{6} log_2 \frac{4}{6} )+\frac{4}{10}I(-\frac{2}{4} log_2 \frac{2}{4} + -\frac{2}{4} log_2 \frac{2}{4} )$

$Remainder(Sun) = \frac{6}{10}(-.33 * -1.58 + -.66 * -.58)+\frac{4}{10}I( -.5 * -1 + -.5 * -1 )$

$Remainder(Sun) = \frac{6}{10}(.52 +.38)+\frac{4}{10}(1)$

$Remainder(Sun) = \frac{6}{10}(.9)+\frac{4}{10}(1)$

$Remainder(Cloud) = .54 + .4$

$Remainder(Cloud) = .94$

Information Gain

Remember $IG(A) = Entropy(Rain) - Remainder(A)$

$IG(Cloud) = 0.94 - 1 = -0.06$

$IG(Sun) = 0.94 - 0.94 = 0.0$

So, Sun has the highest Information gain and that is how you should start your decision tree.

Now, I made this simple since I only had 2 columns but if you had more columns you would need to do this again using Sun as the Target and build off of that.