Revision as of 11:09, 22 April 2008 by Han47 (Talk)

Clustering by finding valleys of densities

-- Approach 1: "Bump Hunting" --

   - Approximate p(x)
   - Find peaks of density
   - Expand cluster boundaries outward until they meet (hopefully in middle!)
   - Define meeting points = Cluster boundary points

-- Approach 2: "Valley Seeking" --

   - Start from valley regions of p(x)
   - Work uphill to connect data points to density peaks

Graph based implementation

   - Estimate p(x)
   - For all patterns $ X_i $, denote by $ N_r(X_i) $, the neighborhood of size $ r $ of $ X_i $.
   - Estimate derivative of density p(x) along direction $ X_i-X_j, \forall X_j\in N_r(X_i) $
           $ S_i(j)=\frac{p(X_i)-p(X_j)}{||X_i-X_j||_{L_2}} $
           Let $ j_{max}=argmax_j S_i(j) $
           If $ S_i(j_{max})<0 $ declare $ i $ to be a root node.
           If $ S_i(j_{max})>0 $ declare $ X_{j_{max}} $ to be parent of $ X_i $.
           If $ S_i(j_{max})=0 $ it depends on the surroundings what link we establish.
   - If $ X_i $ is connected to a root node, then make this root node a parent of $ X_i $, else make $ X_i $ a root node.
   - Repeat

When all points are visited, a forest is constructed. Each connected component (=tree of a forest) is a cluster.

  • Problem: All methods presented above encounter a serious problem because of the following reasons:
   - Density may oscillate a lot
   - Valleys are not well defined.
  • Someone please insert figure here illustrating the above mentioned problem, as drawn in class.

PDE based valley seeking

PDE: Partial Differential Equation

PDE's can be used to minimize energy functionals

  • Simple Example

Consider 1-D curves

$ u(x): [0,1] \rightarrow \Re $ with fixed end points $ u(0)=a $, $ u(1)=b $ (3-1)

<<Picture>>

Suppose Energy of curve is $ E(u)=\int _0 ^1 F(u, u')dx $ for some function $ F: \Re ^2 \rightarrow \Re $

e.g.) $ F(u,u')=|u'|^2 $ (3-2)

The curve that minimizes (or maximizes) E(u) satisfies Euler Equation

$ \frac{\partial F} {\partial u} -\frac{d}{dx}(\frac{\partial F}{\partial u'})=0 $ (3-3)

sometimes written as $ E'=0 \Rightarrow \frac{\partial E}{\partial u}=0 $ (3-4)


Similarly if $ E=\int _0 ^1 F(u,u',u'')dx $ (3-5)

e.g.) $ F(u,u',u'')=|u''|^2 $ (3-6)

Then Euler equation is $ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u'}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u''})=0 $ (3-7)

Similarly, for surface in $ \Re ^2 $

$ u(x,y): [0,1] \times [0,1] \rightarrow \Re $ (3-8)

Suppose energy is given by

$ E(u)=\int _{surface} F(u,u_x, u_y, u_{xx}, u_{xy},u_{yy})dxdy $ (3-9)

e.g.) $ F={u_x}^2+{u_y}^2 $ (3-10)

Then Euler equation is

$ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u_x}) - \frac{d}{dy}(\frac{\partial F}{\partial u_y}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u_{xx}}) + \frac{d^2}{dy^2}(\frac{\partial F}{\partial u_{yy}}) + \frac{d^2}{dxdy}(\frac{\partial F}{\partial u_{xy}})=0 $ (3-11)

  • Illustration

1D curve $ u(x) $ with $ u(0)=a, u(1)=b $ (3-12)

$ E(u)=\int _{0} ^{1} ||\nabla u||^2 dx $ (3-13) Dirichlet integral

$ F=||\nabla u||^2=u_x ^2 $

Euler equation E'=0

$ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u_x})=0 $ (3-14)

$ 0-\frac{d}{dx}(2u_x)=0 \Rightarrow -2u_{xx}=0 \Rightarrow u_{xx}=0 $ (3-15)

with boundary condition, $ u(0)=0, u(1)=b $

Solution: $ u(x)=c_1 x+ c_2 $

$ u(0)=a \Rightarrow c_2=a $ (3-16)

$ u(1)=b \Rightarrow c_1 + c_2 = b \Rightarrow c_1=b-a $ (3-17)

<<Picture>>

In general, cannot solve Euler equation analytically In practice, use 'Gradient descent'

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang