Revision as of 16:31, 15 April 2014

Introduction to local (nonparametric) density estimation methods

A slecture by Yu Liu

(partially based on Prof. Mireille Boutin's ECE662 lecture)

1. Introduction

This slecture introduces two local density estimation methods which are Parzen density estimation and k-nearest neighbor density estimation. Local density estimation is also referred to as non-parametric density estimation. To make things clear, let’s first look at parametric density estimation. In parametric density estimation, we can assume that there exists a density function which can be determined by a set of parameters. The set of parameters are estimated from the sample data and are later used in designing the classifier. However, in some practical situations the assumption that there exists a parametric form of the density function does not hold true. For example, it is very hard to fit a multimodal probability distribution with a simple function. In this case, we need to estimate the density function in the nonparametric way, which means that the density function is estimated locally based on a small set of neighboring samples. Because of this locality, local (nonparametric) density estimation is less accurate than parametric density estimation. In the following text the word “local” is preferred over “nonparametric.”

It is noteworthy that it is very difficult to obtain an accurate local density estimation, especially when the dimension of the feature space is high. So why do we bother using local density estimation? This is because our goal is not to get an accurate estimation, but rather to use the estimation to design a well performed classifier. The inaccuracy of local density estimation does not necessarily lead to a poor decision rule.

2. General Principle

In local density estimation the density function p_n(x) can be approximated by

where v_n is the volume of a small region R around point x, n is the total number of samples x_i (i =1, 2…, n) drawn according to p_n(x), and k_n is the number of x_i’s which fall into region R. The reason why p_n(x) can be calculated in this way is that p_n(x) does not vary much within a relatively small region, thus the probability mass of region R can be approximated by p_n(x)v_n, which equals k_n/n.

@@ Line 5: / Line 5: @@
 (partially based on Prof. [https://engineering.purdue.edu/~mboutin Mireille Boutin]'s ECE[https://www.projectrhea.org/rhea/index.php/ECE662 662] lecture)
 [https://www.projectrhea.org/rhea/images/8/8a/Slecture_introduction_to_local_density_estimation_methods.pdf click here for PDF version]
+----
+== '''1. Introduction'''<br>  ==
+This slecture introduces two local density estimation methods which are Parzen density estimation and k-nearest neighbor density estimation. Local density estimation is also referred to as non-parametric density estimation. To make things clear, let’s first look at parametric density estimation. In parametric density estimation, we can assume that there exists a density function which can be determined by a set of parameters. The set of parameters are estimated from the sample data and are later used in designing the classifier. However, in some practical situations the assumption that there exists a parametric form of the density function does not hold true. For example, it is very hard to fit a multimodal probability distribution with a simple function. In this case, we need to estimate the density function in the nonparametric way, which means that the density function is estimated locally based on a small set of neighboring samples. Because of this locality, local (nonparametric) density estimation is less accurate than parametric density estimation. In the following text the word “local” is preferred over “nonparametric.”<br>
+It is noteworthy that it is very difficult to obtain an accurate local density estimation, especially when the dimension of the feature space is high. So why do we bother using local density estimation? This is because our goal is not to get an accurate estimation, but rather to use the estimation to design a well performed classifier. The inaccuracy of local density estimation does not necessarily lead to a poor decision rule.<br>
+== '''2. General Principle'''  ==
+In local density estimation the density function ''p<sub>n</sub>''(''x'') can be approximated by
+<br>
+where ''v<sub>n</sub>'' is the volume of a small region ''R'' around point ''x'', ''n'' is the total number of samples ''x''<sub>''i''</sub> (''i'' =1, 2…, ''n'') drawn according to ''p<sub>n</sub>''(''x''), and ''k''<sub>''n''</sub> is the number of ''x''<sub>''i''</sub>’s which fall into region ''R''. The reason why ''p<sub>n</sub>''(''x'') can be calculated in this way is that ''p''<sub>''n''</sub>(''x'') does not vary much within a relatively small region, thus the probability mass of region ''R'' can be approximated by ''p''<sub>''n''</sub>(''x'')''v''<sub>''n''</sub>, which equals ''k''<sub>''n''</sub>/''n''.

Difference between revisions of "Introduction to local density estimation methods" - Rhea

Revision as of 16:31, 15 April 2014

Introduction to local (nonparametric) density estimation methods

1. Introduction

2. General Principle

Alumni Liaison