Revision as of 16:54, 15 April 2014

Introduction to local (nonparametric) density estimation methods

A slecture by Yu Liu

(partially based on Prof. Mireille Boutin's ECE662 lecture)

1. Introduction

This slecture introduces two local density estimation methods which are Parzen density estimation and k-nearest neighbor density estimation. Local density estimation is also referred to as non-parametric density estimation. To make things clear, let’s first look at parametric density estimation. In parametric density estimation, we can assume that there exists a density function which can be determined by a set of parameters. The set of parameters are estimated from the sample data and are later used in designing the classifier. However, in some practical situations the assumption that there exists a parametric form of the density function does not hold true. For example, it is very hard to fit a multimodal probability distribution with a simple function. In this case, we need to estimate the density function in the nonparametric way, which means that the density function is estimated locally based on a small set of neighboring samples. Because of this locality, local (nonparametric) density estimation is less accurate than parametric density estimation. In the following text the word “local” is preferred over “nonparametric.”

It is noteworthy that it is very difficult to obtain an accurate local density estimation, especially when the dimension of the feature space is high. So why do we bother using local density estimation? This is because our goal is not to get an accurate estimation, but rather to use the estimation to design a well performed classifier. The inaccuracy of local density estimation does not necessarily lead to a poor decision rule.

2. General Principle

In local density estimation the density function p_n(x) can be approximated by

where v_n is the volume of a small region R around point x, n is the total number of samples x_i (i =1, 2…, n) drawn according to p_n(x), and k_n is the number of x_i’s which fall into region R. The reason why p_n(x) can be calculated in this way is that p_n(x) does not vary much within a relatively small region, thus the probability mass of region R can be approximated by p_n(x)v_n, which equals k_n/n.

Some examples of region R in different dimensions: i) line segment in one-dimension, ii) circle or rectangle in two-dimension, iii) sphere or cube in three-dimension, iv) hyper sphere or hypercube in d-dimension (d > 3).

Three conditions we need to pay attention to when using formula (1) are:
i) . This is because if vn is fixed, then p_n(x) only represents the average probability density as n grows larger, but what we need is the point probability density, so we should have when .
ii) . This is to make sure that we do not get zero probability density.
iii) . This is to make sure that p_n(x) does not diverge.

3. Parzen Density Estimation

In Parzen density estimation v_n is directly determined by n while k_n is a random variable which denotes the number of samples that fall into v_n. Assume that the region R is a d-dimensional hypercube with its edge length h_n, thus
v_n = (h_n)^d
The equivalent conditions which meet the aforementioned three conditions are:
and
Therefore v_n can be chosen as or , where h is an adjustable constant. Now that the relationship between v_n and n is defined, the next step is to determine k_n. To determine k_n define a window function as follows:

where x_i’s (i = 1, 2, …, n) are the given samples and x is the point where the density is to be estimated. Thus we have

The function is called a Parzen window function, which enables us to count the number of sample points in the hypercube with its edge length h_n.

According to [2], using hypercube as the window function may lead to discontinuity in the estimation. This is due to the superimposition of sharp pulses centered at the given sample points when h is small. To overcome this shortcoming, we can consider a more general form of window function rather than the hypercube. Note that if the following two conditions are met, the estimated p_n(x) is guaranteed to be proper.
and
Therefore a better choice of window function which removes discontinuity can be Gaussian window:

The estimated density is given by
(2)
Consider a one-dimension case, assume that , thus , where h is an adjustable constant. Substitute into formula (2) we have

We can see that if n equals one, p_n(x) is just the window function. If n approaches infinity, p_n(x) can converge to any complex form. If n is relatively small, p_n(x) is very sensitive to the value of h. In general small h leads to the noise error while large h leads to the over-smoothing error, which can be illustrated by the following example.

In this experiment samples are 5000 points on 2-D plane with Gaussian distribution. The mean vector is [1 2], and the covariance matrix is [1 0; 0 1]. Choose rectangle Parzen window with , thus . Fig. 1 shows the sample distribution. Fig. 2 shows the ideal probability density distribution. Fig. 3 shows the result of Parzen density estimation.

Figure 1. 5000 sample points on 2-D plane with Gaussian distribution

Figure 2. The ideal probability density distribution

Figure 3. The result of Parzen density estimation

Next we change the value of h_n and see how it affects the estimation. Fig. 4 shows the result of Parzen density estimation when h_n is twice its initial value. Fig. 5 shows the result of Parzen density estimation when h_n is its initial value divided by two. We can see that the results agree with the aforesaid property of h_n.

@@ Line 21: / Line 21: @@
 <br>
 where ''v<sub>n</sub>'' is the volume of a small region ''R'' around point ''x'', ''n'' is the total number of samples ''x''<sub>''i''</sub> (''i'' =1, 2…, ''n'') drawn according to ''p<sub>n</sub>''(''x''), and ''k''<sub>''n''</sub> is the number of ''x''<sub>''i''</sub>’s which fall into region ''R''. The reason why ''p<sub>n</sub>''(''x'') can be calculated in this way is that ''p''<sub>''n''</sub>(''x'') does not vary much within a relatively small region, thus the probability mass of region ''R'' can be approximated by ''p''<sub>''n''</sub>(''x'')''v''<sub>''n''</sub>, which equals ''k''<sub>''n''</sub>/''n''.
+Some examples of region ''R'' in different dimensions: i) line segment in one-dimension, ii) circle or rectangle in two-dimension, iii) sphere or cube in three-dimension, iv) hyper sphere or hypercube in ''d''-dimension (''d'' &gt; 3).
+Three conditions we need to pay attention to when using formula (1) are:<br>i) . This is because if vn is fixed, then ''p''<sub>''n''</sub>(''x'') only represents the average probability density as ''n'' grows larger, but what we need is the point probability density, so we should have when .<br>ii) . This is to make sure that we do not get zero probability density.<br>iii) . This is to make sure that ''p''<sub>''n''</sub>(''x'') does not diverge.<br>
+== '''3. Parzen Density Estimation'''  ==
+In Parzen density estimation ''v''<sub>''n''</sub> is directly determined by ''n'' while ''k''<sub>''n''</sub> is a random variable which denotes the number of samples that fall into ''v''<sub>''n''</sub>. Assume that the region ''R'' is a ''d''-dimensional hypercube with its edge length ''h''<sub>''n''</sub>, thus <br>''v''<sub>''n''</sub> = (''h''<sub>''n''</sub>)''<sup>d</sup>''<br>The equivalent conditions which meet the aforementioned three conditions are:<br> and <br>Therefore ''v''<sub>''n''</sub> can be chosen as or , where ''h'' is an adjustable constant. Now that the relationship between ''v''<sub>''n''</sub> and ''n'' is defined, the next step is to determine ''k''<sub>''n''</sub>. To determine ''k''<sub>''n''</sub> define a window function as follows:<br> <br>where ''x''<sub>''i''</sub>’s (''i'' = 1, 2, …, ''n'') are the given samples and ''x'' is the point where the density is to be estimated. Thus we have<br> <br> <br>The function is called a Parzen window function, which enables us to count the number of sample points in the hypercube with its edge length ''h''<sub>''n''</sub>.
+According to [2], using hypercube as the window function may lead to discontinuity in the estimation. This is due to the superimposition of sharp pulses centered at the given sample points when h is small. To overcome this shortcoming, we can consider a more general form of window function rather than the hypercube. Note that if the following two conditions are met, the estimated ''p''<sub>''n''</sub>(''x'') is guaranteed to be proper. <br> and <br>Therefore a better choice of window function which removes discontinuity can be Gaussian window: <br> <br>The estimated density is given by<br> (2)<br>Consider a one-dimension case, assume that , thus , where ''h'' is an adjustable constant. Substitute into formula (2) we have<br> <br>We can see that if ''n'' equals one, ''p''<sub>''n''</sub>(''x'') is just the window function. If ''n'' approaches infinity, ''p''<sub>''n''</sub>(''x'') can converge to any complex form. If ''n'' is relatively small, ''p''<sub>''n''</sub>(''x'') is very sensitive to the value of ''h''. In general small ''h'' leads to the noise error while large ''h'' leads to the over-smoothing error, which can be illustrated by the following example.
+In this experiment samples are 5000 points on 2-D plane with Gaussian distribution. The mean vector is [1 2], and the covariance matrix is [1 0; 0 1]. Choose rectangle Parzen window with , thus . Fig. 1 shows the sample distribution. Fig. 2 shows the ideal probability density distribution. Fig. 3 shows the result of Parzen density estimation.<br>
+[[Image:1ly.png|border|center|920x750px]]
+Figure 1. 5000 sample points on 2-D plane with Gaussian distribution
+[[Image:2ly.png|border|center|920x750px]]
+Figure 2. The ideal probability density distribution
+[[Image:3ly.png|border|center|920x750px]]
+Figure 3. The result of Parzen density estimation
+Next we change the value of ''h''<sub>''n''</sub> and see how it affects the estimation. Fig. 4 shows the result of Parzen density estimation when ''h''<sub>''n''</sub> is twice its initial value. Fig. 5 shows the result of Parzen density estimation when ''h''<sub>''n''</sub> is its initial value divided by two. We can see that the results agree with the aforesaid property of ''h''<sub>''n''</sub>.

Difference between revisions of "Introduction to local density estimation methods" - Rhea

Revision as of 16:54, 15 April 2014

Contents

Introduction to local (nonparametric) density estimation methods

1. Introduction

2. General Principle

3. Parzen Density Estimation

Alumni Liaison