When applying K-nearest neighbor (KNN) method or Artifical Neural Network (ANN) method for classification, the first question we need to answer is how to choose the model (i.e. in KNN what K should be, or in ANN, how many hidden layers we need?).

A popularly used method is the leave-one-out cross validation. Let's assume we want to find the optimum parameter lamda among M choices. (in KNN case, $ \lambda $ is the K, in ANN, $ \lambda $ is the number of the hidden layer). Assume that we have a data set of N samples.

For each choice of lamda, do the following steps:

1. Do N experiements. In each experiement, use N-1 samples for training, and leav only 1 sample for testing.

2. Compute the testing error $ E_i $, $ i=1,..,N $

3. After N experiments, compute the overall estimated error:

$ E_\lambda = \frac{1}{N}\left( {\sum\limits_{i = 1}^N {E_i } } \right) $

4. Do with all $ \lambda $ and choose one that gives us the smallest overall estimated error.


Fig OldKiwi.jpg


Figure 1: the way to split the data set in this technique

Alumni Liaison

Correspondence Chess Grandmaster and Purdue Alumni

Prof. Dan Fleetwood