Talk:Lecture 21 - Decision Trees(Continued) OldKiwi - Rhea

--- Virtue of Decision Trees (DT) when the # of categories is large

We discussed about the virtue of DT when the # of categories is large in class. Here is some more thoughts about it.

To understand the virtue of DT, let's consider it in comparison to classifiers which based on density estimation or seperation hyperplanes. As I see it, the most significant difference between the two groups of methods is that the 2nd group tries to separate all samples in a category from the rest at one shoot, while DT adopts a divide-and-conquer strategy - partition samples into smaller sub-groups which can be separated from the rest more easily.

When # categories is larger than 2, the method based on density estimation or seperation hyperplane has to consider pair-wise or one-vs-all discrimination between categories. The problem gets a lot more difficult when then # categories becomes large. However, for DT this would not be a big issue. DT aims to find a set of rules which partitions the data space into several subspaces in which the sample impurity is minimized. In constructing the rules, we focused on the internal impurity within subspaces rather than the capacity of discrimination between categories. That's why # of categories would affect DT much (or at least not as much as to the other methods).

--- About definition of Entropy Impurity

In class, there was a question about the base of log used in entropy impurity. Here is a thought:

Change the base of log is equivalent to multiply it by a constact factor. So, as long as we use the same base in calculating impurity for every node of the tree, there should not be a difference in the rules obtained.

Talk:Lecture 21 - Decision Trees(Continued) OldKiwi - Rhea

Alumni Liaison