Revision as of 10:58, 22 April 2008 by Guptam (Talk)

Clustering by finding valleys of densities

-- Approach 1: "Bump Hunting" --

   - Approximate p(x)
   - Find peaks of density
   - Expand cluster boundaries outward until they meet (hopefully in middle!)
   - Define meeting points = Cluster boundary points

-- Approach 2: "Valley Seeking" --

   - Start from valley regions of p(x)
   - Work uphill to connect data points to density peaks

Graph based implementation

   - Estimate p(x)
   - For all patterns $ X_i $, denote by $ N_r(X_i) $, the neighborhood of size $ r $ of $ X_i $. Estimate derivative of density p(x) along direction $ X_i-X_j \forall X_j\inN_r(X_i) $

$ S_i(j)=\frac{p(X_i)-p(X_j)}{||X_i-X_j||_{L_2}} $

Let $ j_{max}=argmax_j S_i(j) $

If $ S_i(j_{max})<0 $ declare $ i $ to be a root node.

If $ S_i(j_{max})>0 $ declare $ X_{j_{max}} $ to be parent of $ X_i $.

If $ S_i(j_{max})=0 $ it depends on the surroundings what link we establish.

   - If $ X_i $ is connected to a root node, then make this root node a parent of $ X_i $, else make $ X_i $ a root node.
   - Repeat

When all points are visited, a forest is constructed. Each connected component (=tree of a forest) is a cluster.

  • Problem: All methods presented above encounter a serious problem because of the following reasons:
   - Density may oscillate a lot
   - Valleys are not well defined.
  • Someone please insert figure here illustrating the above mentioned problem, as drawn in class.

PDE based valley seeking

PDE: Partial Differential Equation

PDE's can be used to minimize energy functionals

  • Simple Example

Consider 1-D curves

$ u(x): [0,1] \rightarrow \Re $ with fixed end points $ u(0)=a $, $ u(1)=b $ (3-1)

<<Picture>>

Suppose Energy of curve is $ E(u)=\int _0 ^1 F(u, u')dx $ for some function $ F: \Re ^2 \rightarrow \Re $

e.g.) $ F(u,u')=|u'|^2 $ (3-2)

The curve that minimizes (or maximizes) E(u) satisfies Euler Equation

$ \frac{\partial F} {\partial u} -\frac{d}{dx}(\frac{\partial F}{\partial u'})=0 $ (3-3)

sometimes written as $ E'=0 \Rightarrow \frac{\partial E}{\partial u}=0 $ (3-4)

Similarly if $ E=\int _0 ^1 F(u,u',u'')dx $ (3-5)

e.g.) $ F(u,u',u'')=|u''|^2 $ (3-6)

Then Euler equation is $ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u'}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u''})=0 $ (3-7)

Similarly, for surface in $ \Re ^2 $

$ u(x,y): [0,1] \times [0,1] \rightarrow \Re $ (3-8)

Suppose energy is given by

$ E(u)=\int _{surface} F(u,u_x, u_y, u_{xx}, u_{xy},u_{yy})dxdy $ (3-9)

e.g.) $ F={u_x}^2+{u_y}^2 $ (3-10)

Then Euler equation is

$ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u_x}) - \frac{d}{dy}(\frac{\partial F}{\partial u_y}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u_{xx}}) + \frac{d^2}{dy^2}(\frac{\partial F}{\partial u_{yy}}) + \frac{d^2}{dxdy}(\frac{\partial F}{\partial u_{xy}})=0 $ (3-11)

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett