Line 3: Line 3:
 
Ex.
 
Ex.
  
[[Image:Lecture22_DecisionTree_OldKiwi.JPG]]
+
[[Image:Lecture22_DecisionTree_OldKiwi.JPG]] Figure 1
  
 
To assign category to a leaf node.
 
To assign category to a leaf node.
Line 42: Line 42:
 
Synonymons="unsupervised learning"
 
Synonymons="unsupervised learning"
  
[[Image:PartitionCluster_OldKiwi.jpg]]
+
[[Image:PartitionCluster_OldKiwi.jpg]] Figure 2
  
[[Image:HierachichalCluster_OldKiwi.jpg]]
+
[[Image:HierachichalCluster_OldKiwi.jpg]] Figure 3
  
 
==Clustering as a useful technique for searching in databases==
 
==Clustering as a useful technique for searching in databases==
Line 54: Line 54:
 
* Example: Dewey system to index books in a library
 
* Example: Dewey system to index books in a library
  
[[Image:Dewey_OldKiwi.jpg]]
+
[[Image:Dewey_OldKiwi.jpg]] Figure 4
  
 
* Example of Index: Face Recognition
 
* Example of Index: Face Recognition
Line 64: Line 64:
 
- Search will be faster because of <math>\bigtriangleup</math> inequality.
 
- Search will be faster because of <math>\bigtriangleup</math> inequality.
  
[[Image:Lec22_hiercluster_OldKiwi.PNG]]
+
[[Image:Lec22_hiercluster_OldKiwi.PNG]] Figure 5
  
 
* Example: Image segmentation is a clustering problem
 
* Example: Image segmentation is a clustering problem
 
- dataset = pixels in image
 
- dataset = pixels in image
 +
 
- each cluster is an object in image
 
- each cluster is an object in image
  
[[Image:Lec22_housecluster_OldKiwi.PNG]]
+
[[Image:Lec22_housecluster_OldKiwi.PNG]] Figure 6
  
 
Input to a clustering algorithm is either
 
Input to a clustering algorithm is either
 +
 
- distances between each pairs of objects in dataset
 
- distances between each pairs of objects in dataset
 +
 
- feature vectors for each object in dataset
 
- feature vectors for each object in dataset

Revision as of 17:21, 6 April 2008

Note: Most tree growing methods favor greatest impurity reduction near the root node.

Ex.

Lecture22 DecisionTree OldKiwi.JPG Figure 1

To assign category to a leaf node.

Easy!

If sample data is pure

-> assign this class to leaf.

else

-> assign the most frequent class.

Note: Problem of building decision tree is "ill-conditioned"

i.e. small variance in the training data can yield large variations in decision rules obtained.

Ex. p.405(D&H)

A small move of one sample data can change the decision rules a lot.


Reference about clustering

"Data clustering, a review," A.K. Jain, M.N. Murty, P.J. Flynn[1]

"Algorithms for clustering data," A.K. Jain, R.C. Dibes[2]

"Support vector clustering," Ben-Hur, Horn, Siegelmann, Vapnik [3]

"Dynamic cluster formation using level set methods," Yip, Ding, Chan[4]

What is clustering?

The task of finding "natural " groupings in a data set.

Synonymons="unsupervised learning"

PartitionCluster OldKiwi.jpg Figure 2

HierachichalCluster OldKiwi.jpg Figure 3

Clustering as a useful technique for searching in databases

Clustering can be used to construct an index for a large dataset to be searched quickly.

  • Definition: An index is a data structure that enables sub-linear time look up.
  • Example: Dewey system to index books in a library

Dewey OldKiwi.jpg Figure 4

  • Example of Index: Face Recognition

- need face images with label

- must cluster to obtain sub-linear search time

- Search will be faster because of $ \bigtriangleup $ inequality.

Lec22 hiercluster OldKiwi.PNG Figure 5

  • Example: Image segmentation is a clustering problem

- dataset = pixels in image

- each cluster is an object in image

Lec22 housecluster OldKiwi.PNG Figure 6

Input to a clustering algorithm is either

- distances between each pairs of objects in dataset

- feature vectors for each object in dataset

Alumni Liaison

Ph.D. on Applied Mathematics in Aug 2007. Involved on applications of image super-resolution to electron microscopy

Francisco Blanco-Silva