Difference between revisions of "Lecture 21 - Decision Trees(Continued) OldKiwi" - Rhea

Revision as of 22:49, 1 April 2008

When the number of categories, c is big, decision tress are particularly good.

Example: Consider the query "Identify the fruit" from a set of c=7 categories {watermelon, apple, grape, lemon, grapefruit, banana, cherry} .

One possible decision tree based on simple queries is the following:

For constructing a decision tree, for a given classification problem, we have to answer these three questions

1) Which question shoud be asked at a given node -"Query Selection"

2) When should we stop asking questions and declare the node to be a leaf -"When should we stop splitting"

3) Once a node is decided to be a leaf, what category should be assigned to this leaf -"Leaf classification"

We shall discuss questions 1 and 2 (3 being very trivial)

Need to define 'impurity' of a dataset such that $$ impurity = 0 $$ when all the training data belongs to one class.

Impurity is large when the training data contain equal percentages of each class

$P(\omega _i) = \frac{1}{C}$ ; for all $$ i $$

Let $$ I $$ denote the impurity. Impurity can be defined in the following ways:

$I = \sum_{j}P(\omega _j)\log_2P(\omega _j)$

@@ Line 25: / Line 25: @@
 Impurity is large when the training data contain equal percentages of each class
-<math> P(\omega _i) = \frac{1}{C}; for all i </math>
+<math> P(\omega _i) = \frac{1}{C} </math>; for all <math>i</math>
+Let <math> I </math> denote the impurity. Impurity can be defined in the following ways:
+* "Entropy Impurity":
+<math>I = \sum_{j}P(\omega _j)\log_2P(\omega _j)</math>