Difference between revisions of "Lecture 25 - Clustering Algorithms OldKiwi" - Rhea

Revision as of 07:10, 16 April 2008

Clustering Methods

Algorithms for clustering from feature vector

Called "Partitional clustering" in Jain and Dude as opposed to "hierarchical clustering"

Clustering feature vectors = finding separation between clusters but these are not known

<<Picture>>

We have a set of points S Want to find subsets $S_1 , S_2 , \cdots , S_c$ such that $S=S_1 \cup S_2 \cup \cdots \cup S_c$ and $S_i \cap S_j = \phi$

need to define "clustering criteria" i.e., a measure of how natural the clustering is.

if c=2,

consider

$J=tr(S_m ^{-1} S_w)$

where $$ S_w $$ is "within class scatter matrix"

$S_w= \sum _{i=1, X_i \in S_1} ^d ||X_i - \mu _1||^2 + \sum _{i=1, X_i \in S_2} ^d ||X_i - \mu _2||^2$ (2-1)

$\mu _1 = \frac{1}{|S_1|} \sun _{X_i \in S_1} X_i$ , $\mu _2 = \frac{1}{|S_2|} \sun _{X_i \in S_2} X_i$ (2-2)

$$ S_m $$ is "mixture scatter matrix"

$S_m= \sum _{i=1} ^d ||X_i - \mu||^2$ , $\mu = \frac{1}{d} \sum _{i=1} ^{d} X_i$ (2-3)

Try to find $$ S_1 $$ and $$ S_2 $$ that minimize J.

Exhaustive search procedure

Examnple with 6 pattern $X_1 , X_2 , \cdots , X_6$

List all partition of 6 points into 2 sets.

<<Someting>>

Evaluate J for each partition

Cansider other J's

$J=ln|S_m ^{-1} S_w|$ (2-4)

$J=tr (S_m) - \mu (tr S_w -c)$ (2-5)

C is fixed constant, $\mu$ : Largrange multiplier

$J=\frac{tr(S_m)}{tr(S_w)}$ (2-6)

To speed up search "use iterative procedure"

Pick a partition at random

$S_1={X_1, X_2, X_3}, S_2= {X_4, X_5, X_6}$ (2-7)

Compute J

Consider effect of moving $$ X_1 $$ into $S_2 \Rightarrow \Delta J_{12}$ , $$ X_2 $$ into $S_2 \Rightarrow \Delta J_{22}$ , $$ X_3 $$ into $S_3 \Rightarrow \Delta J_{33}$ , $$ X_4 $$ into $S_1 \Rightarrow \Delta J_{41}$ , $$ X_5 $$ into $S_1 \Rightarrow \Delta J_{51}$ , $$ X_6 $$ into $S_1 \Rightarrow \Delta J_{61}$

Apply (Simultaneously) all the moves for which $\Delta J$ is negative, repeat procedure

OR

Apply the move for which $\Delta J$ is the most negative repeat procedure

Convergence?

If convergence, global minimum? No idea

If c>2, can use similar procedure

If c is unknown, try c=2,3,4, etc (hierarchical clustering)

Look at evolution of J for c increase (similarity scale)

@@ Line 14: / Line 14: @@
 We have a set of points S
 Want to find subsets <math>S_1 , S_2 , \cdots , S_c</math> such that <math>S=S_1 \cup S_2 \cup \cdots \cup S_c</math> and <math>S_i \cap S_j = \phi</math>
+need to define "clustering criteria" i.e., a measure of how natural the clustering is.
+if c=2,
+consider
+<math>J=tr(S_m ^{-1} S_w)</math>
+where <math>S_w</math> is "within class scatter matrix"
+<math>S_w= \sum _{i=1, X_i \in S_1} ^d ||X_i - \mu _1||^2 + \sum _{i=1, X_i \in S_2} ^d ||X_i - \mu _2||^2</math>  (2-1)
+<math>\mu _1 = \frac{1}{|S_1|} \sun _{X_i \in S_1} X_i</math>, <math>\mu _2 = \frac{1}{|S_2|} \sun _{X_i \in S_2} X_i</math> (2-2)
+<math>S_m</math> is "mixture scatter matrix"
+<math>S_m= \sum _{i=1} ^d ||X_i - \mu||^2</math>, <math>\mu = \frac{1}{d} \sum _{i=1} ^{d} X_i</math> (2-3)
+Try to find <math>S_1</math> and <math>S_2</math> that minimize J.
+Exhaustive search procedure
+Examnple with 6 pattern <math>X_1 , X_2 , \cdots , X_6</math>
+List all partition of 6 points into 2 sets.
+<<Someting>>
+Evaluate J for each partition
+Cansider other J's
+<math>J=ln|S_m ^{-1} S_w|</math> (2-4)
+<math>J=tr (S_m) - \mu (tr S_w -c)</math> (2-5)
+C is fixed constant, <math>\mu</math> : Largrange multiplier
+<math>J=\frac{tr(S_m)}{tr(S_w)}</math> (2-6)
+To speed up search "use iterative procedure"
+Pick a partition at random
+<math>S_1={X_1, X_2, X_3}, S_2= {X_4, X_5, X_6}</math> (2-7)
+Compute J
+Consider effect of moving <math>X_1</math> into <math>S_2 \Rightarrow \Delta J_{12}</math>,  <math>X_2</math> into <math>S_2 \Rightarrow \Delta J_{22}</math>, <math>X_3</math> into <math>S_3 \Rightarrow \Delta J_{33}</math>, <math>X_4</math> into <math>S_1 \Rightarrow \Delta J_{41}</math>, <math>X_5</math> into <math>S_1 \Rightarrow \Delta J_{51}</math>, <math>X_6</math> into <math>S_1 \Rightarrow \Delta J_{61}</math>
+Apply (Simultaneously) all the moves for which <math>\Delta J</math> is negative, repeat procedure
+OR
+Apply the move for which <math>\Delta J</math> is the most negative repeat procedure
+Convergence?
+If convergence, global minimum? No idea
+If c>2, can use similar procedure
+If c is unknown, try c=2,3,4, etc (hierarchical clustering)
+Look at evolution of J for c increase (similarity scale)

Difference between revisions of "Lecture 25 - Clustering Algorithms OldKiwi" - Rhea

Revision as of 07:10, 16 April 2008

Clustering Methods

Algorithms for clustering from feature vector

Alumni Liaison