Difference between revisions of "Talk:Lecture 21 - Decision Trees(Continued) OldKiwi" - Rhea

Latest revision as of 10:56, 7 April 2008

--- Virtue of Decision Trees (DT) when the # of categories is large

We discussed about the virtue of DT when the # of categories is large in class. Here is some more thoughts about it.

To understand the virtue of DT, let's consider it in comparison to classifiers which based on density estimation or seperation hyperplanes. As I see it, the most significant difference between the two groups of methods is that the 2nd group tries to separate all samples in a category from the rest at one shoot, while DT adopts a divide-and-conquer strategy - partition samples into smaller sub-groups which can be separated from the rest more easily.

When # categories is larger than 2, the method based on density estimation or seperation hyperplane has to consider pair-wise or one-vs-all discrimination between categories. The problem gets a lot more difficult when then # categories becomes large. However, for DT this would not be a big issue. DT aims to find a set of rules which partitions the data space into several subspaces in which the sample impurity is minimized. In constructing the rules, we focused on the internal impurity within subspaces rather than the capacity of discrimination between categories. That's why # of categories would affect DT much (or at least not as much as to the other methods).

--- About definition of Entropy Impurity

In class, there was a question about the base of log used in entropy impurity. Here is a thought:

Change the base of log is equivalent to multiply it by a constact factor. So, as long as we use the same base in calculating impurity for every node of the tree, there should not be a difference in the rules obtained.

@@ Line 1: / Line 1: @@
-Virtue of Decision Trees (DT) when the # of categories is large
+--- Virtue of Decision Trees (DT) when the # of categories is large
 We discussed about the virtue of DT when the # of categories is large in class. Here is some more thoughts about it.
@@ Line 7: / Line 7: @@
 When # categories is larger than 2, the method based on density estimation or seperation hyperplane has to consider pair-wise or one-vs-all discrimination between categories. The problem gets a lot more difficult when then # categories becomes large. However, for DT this would not be a big issue. DT aims to find a set of rules which partitions the data space into several subspaces in which the sample impurity is minimized. In constructing the rules, we focused on the internal impurity within subspaces rather than the capacity of discrimination between categories. That's why # of categories would affect DT much (or at least not as much as to the other methods).
-About definition of Entropy Impurity
+--- About definition of Entropy Impurity
 In class, there was a question about the base of log used in entropy impurity. Here is a thought:
 Change the base of log is equivalent to multiply it by a constact factor. So, as long as we use the same base in calculating impurity for every node of the tree, there should not be a difference in the rules obtained.

Difference between revisions of "Talk:Lecture 21 - Decision Trees(Continued) OldKiwi" - Rhea

Latest revision as of 10:56, 7 April 2008

Alumni Liaison