(New page: Category:slecture Category:ECE662Spring2014Boutin Category:ECE Category:ECE662 Category:pattern recognition <center><font size= 4> K Nearest Neighbors </font size> ...) |
|||
Line 21: | Line 21: | ||
**You may include links to other [https://www.projectrhea.org/learning/about_Rhea.php Project Rhea] pages. | **You may include links to other [https://www.projectrhea.org/learning/about_Rhea.php Project Rhea] pages. | ||
+ | K Nearest Neighbors is a classification algorithm based on local density estimation. | ||
+ | |||
+ | The goal of a classifier is to take a data-point <math> x_{0} </math>, usually represented as a vector, and assign it a class label from some set of <math> C </math> classes <math> w_1,w_2,...w_C</math> | ||
+ | |||
+ | |||
+ | If the data-points from each class are random variables, then it can be proven that the optimal decision rule to classify a point <math> x_0 </math> is Bayes Rule, which is to choose the class for which <math> P(w_i|x_0) </math>, the is highest. | ||
+ | |||
+ | Therefore, it is desirable to assume that the data-points are random variables, and attempt to estimate <math> P(w_i|x_0) </math>, in order to use it to classify the points. | ||
+ | |||
+ | By Bayes Theorem, <math> P(w_i|x_0) = P(x_o|w_i)P(w_i)</math>. This is advantageous because it is often easier to estimate <math> P(x_o|w_i)</math> and <math>P(w_i)</math> separately than to estimate <math> P(w_i|x_0) </math> directly. | ||
+ | |||
+ | K Nearest Neighbors does not explicitly estimate <math> P(w_i|x_0) = P(x_o|w_i)P(w_i)</math>, but instead does so implicitly, and chooses the class with the highest estimated <math> P(w_i|x_0) </math>. | ||
+ | |||
+ | This method belongs to the class of local density estimation methods because it forms a separate density estimate locally around each test point <math> x_0</math>. This is in contrast to non-local methods which assume a symbolic form (for example gaussian) for the distributions and estimate the entire distribution by finding the parameters (the mean and variance for the gaussian example) of that distribution. | ||
+ | |||
+ | The classifier is given a set of <math> N </math> data-points <math> x_{1}, x_{2}, ..., x_{N} </math> along with corresponding labels <math> y_{1}, y_{2}, ..., y_{N} </math> | ||
+ | |||
+ | Together these are often called the training set, because they are used to "train" the classifier, i.e. estimate the distributions. | ||
+ | |||
+ | |||
+ | |||
+ | <math> \alpha = \sum_{i=1}^{N} \beta_{i}</math> | ||
---- | ---- |
Revision as of 09:16, 30 April 2014
K Nearest Neighbors
Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.
Post your slecture material here. Guidelines:
- If you are making a text slecture
- Type text using wikitext markup languages
- Type all equations using latex code between <math> </math> tags.
- You may include links to other Project Rhea pages.
K Nearest Neighbors is a classification algorithm based on local density estimation.
The goal of a classifier is to take a data-point $ x_{0} $, usually represented as a vector, and assign it a class label from some set of $ C $ classes $ w_1,w_2,...w_C $
If the data-points from each class are random variables, then it can be proven that the optimal decision rule to classify a point $ x_0 $ is Bayes Rule, which is to choose the class for which $ P(w_i|x_0) $, the is highest.
Therefore, it is desirable to assume that the data-points are random variables, and attempt to estimate $ P(w_i|x_0) $, in order to use it to classify the points.
By Bayes Theorem, $ P(w_i|x_0) = P(x_o|w_i)P(w_i) $. This is advantageous because it is often easier to estimate $ P(x_o|w_i) $ and $ P(w_i) $ separately than to estimate $ P(w_i|x_0) $ directly.
K Nearest Neighbors does not explicitly estimate $ P(w_i|x_0) = P(x_o|w_i)P(w_i) $, but instead does so implicitly, and chooses the class with the highest estimated $ P(w_i|x_0) $.
This method belongs to the class of local density estimation methods because it forms a separate density estimate locally around each test point $ x_0 $. This is in contrast to non-local methods which assume a symbolic form (for example gaussian) for the distributions and estimate the entire distribution by finding the parameters (the mean and variance for the gaussian example) of that distribution.
The classifier is given a set of $ N $ data-points $ x_{1}, x_{2}, ..., x_{N} $ along with corresponding labels $ y_{1}, y_{2}, ..., y_{N} $
Together these are often called the training set, because they are used to "train" the classifier, i.e. estimate the distributions.
$ \alpha = \sum_{i=1}^{N} \beta_{i} $
Questions and comments
If you have any questions, comments, etc. please post them on this page.