Difference between revisions of "Perceptron Convergence Theorem OldKiwi" - Rhea

Latest revision as of 06:37, 21 April 2013

A proof of the convergence of the Perceptron

ECE662: Statistical decision theory and decision making processes

(Supplement to Lecture 10 of the lecture notes)

The following theorem, due to Novikoff (1962), proves the convergence of a perceptron_OldKiwi using linearly-separable samples. This proof was taken from Learning Kernel Classifiers, Theory and Algorithms By Ralf Herbrich

Consider the following definitions: A training set $z=(x,y)\in\mathbb{Z}^m$

"Functional margin" on example $(x_i,y_i)\in z$ is defined as $\tilde{\gamma}_i = y_i\langle x_i,c \rangle$ ;

"Functional margin" on training set $$ z $$ is defined as $\tilde{\gamma}_z = \min_{(x_i,y_i)\in z} \tilde{\gamma}_i$ ;

"Geometrical margin" on example $(x_i,y_i)\in z$ is defined as $\gamma_i = \frac{\tilde{\gamma}_i(c)}{\|c\|}$ ;

"Geometrical margin" on training set $$ z $$ is defined as $\gamma_z = \frac{\tilde{\gamma}_z(c)}{\|c\|}$ .

Let $\psi = \max_{x_i\in x} \|\phi(x_i)\|$ , where $\phi(x_i)$ is the feature vector for $$ x_i $$ .

Now let $$ c_k $$ be a final solution vector after $$ k $$ steps. Then the last update of the Perceptron algorithm has

$c_k = c_{k-1} + y_ix_i$ .

For any vector, $$ c^* $$ , we have

$\langle c^*,c_k \rangle = \langle c^*,c_{k-1} \rangle + y_i \langle c^*,x_i \rangle \geq \langle c^*,c_{k-1}\rangle + \gamma_z(c^*) \geq \ldots \geq k\gamma_z(c^*)$ ,

where the inequalities follow from repeated applications up to step 0 where we assume $$ c_0=0 $$ . Similarly, by the algorithm definition,

$\|c_k\|^2 = \|c_{k-1}\|^2 + 2y_i\langle c_{k-1},x_i \rangle + \|x_i\|^2 \leq \|c_{k-1}\|^2 + \psi^2 \leq \ldots \leq k\psi^2.$

Then by the Cauchy-Schwartz inequality, we have

$k\gamma_z(c^*) \leq \langle c^*,c_k \rangle \leq \|c^*\| \cdot \|c_k\| \leq \sqrt{k} \psi$ .

It follows, then, that the number of required iterations of the Perceptron algorithm has a finite upper bound, i.e.

$k\leq \left( \frac{\psi}{\gamma_z(c^*)}\right)^2$

Back to Lecture 10

@@ Line 1: / Line 1: @@
-(See [[Lecture 10_OldKiwi]])
+<center><font size= 4>
+'''A proof of the convergence of the Perceptron'''
+</font size>
-The following theorem, due to Novikoff (1962), proves the convergence of a perceptron using linearly-separable samples.  This proof was taken from [http://books.google.com/books?id=el_cDjHdP0cC&pg=PA51&lpg=PA51&dq=novikoff+perceptron+convergence&source=web&ots=2pWu1dQHIM&sig=K5ubpLbO4n0XjrgzhEuOD06Xfmo#PPA261,M1 Learning Kernel Classifiers, Theory and Algorithms By Ralf Herbrich]
+[[ECE662]]: Statistical decision theory and decision making processes
+(Supplement to [[Lecture_10_-_Batch_Perceptron_and_Fisher_Linear_Discriminant_OldKiwi|Lecture 10]] of the [[ECE662_Pattern_Recognition_Decision_Making_Processes_Spring2008_sLecture_collective|lecture notes]])
+</center>
+----
+The following theorem, due to Novikoff (1962), proves the convergence of a [[perceptron_OldKiwi]] using linearly-separable samples.  This proof was taken from [http://books.google.com/books?id=el_cDjHdP0cC&pg=PA51&lpg=PA51&dq=novikoff+perceptron+convergence&source=web&ots=2pWu1dQHIM&sig=K5ubpLbO4n0XjrgzhEuOD06Xfmo#PPA261,M1 Learning Kernel Classifiers, Theory and Algorithms By Ralf Herbrich]
 Consider the following definitions:
-<math> </math>
 A training set <math>z=(x,y)\in\mathbb{Z}^m</math>
@@ Line 21: / Line 29: @@
 <math>c_k = c_{k-1} + y_ix_i</math>.
-For any vector, |c*|, we have
+For any vector, <math>c^*</math>, we have
-|cc_kinnerprod|,
+<math>\langle c^*,c_k \rangle = \langle c^*,c_{k-1} \rangle + y_i \langle c^*,x_i \rangle \geq \langle c^*,c_{k-1}\rangle + \gamma_z(c^*) \geq \ldots \geq k\gamma_z(c^*)</math>,
-.. |c*| image:: tex
+where the inequalities follow from repeated applications up to step 0 where we assume <math>c_0=0</math>. Similarly, by the algorithm definition,
-   :alt: tex: c^*
-.. |cc_kinnerprod| image:: tex
+<math>\|c_k\|^2 = \|c_{k-1}\|^2 + 2y_i\langle c_{k-1},x_i \rangle + \|x_i\|^2 \leq \|c_{k-1}\|^2 + \psi^2 \leq \ldots \leq k\psi^2.</math>
-   :alt: tex: \langle c^*,c_k \rangle = \langle c^*,c_{k-1} \rangle + y_i \langle c^*,x_i \rangle \geq \langle c^*,c_{k-1}\rangle + \gamma_z(c^*) \geq \ldots \geq k\gamma_z(c^*)
-where the inequalities follow from repeated applications up to step 0 where we assume |c0|. Similarly, by the algorithm definition,
-|cknorm|
-.. |c0| image:: tex
-   :alt: tex: c_0=0
-.. |cknorm| image:: tex
-   :alt: tex: \|c_k\|^2 = \|c_{k-1}\|^2 + 2y_i\langle c_{k-1},x_i \rangle + \|x_i\|^2 \leq \|c_{k-1}\|^2 + \psi^2 \leq \ldots \leq k\psi^2.
 Then by the Cauchy-Schwartz inequality, we have
-|kgamz|.
+<math>k\gamma_z(c^*) \leq \langle c^*,c_k \rangle \leq \|c^*\| \cdot \|c_k\| \leq \sqrt{k} \psi</math>.
-.. |kgamz| image:: tex
-   :alt: tex: k\gamma_z(c^*) \leq \langle c^*,c_k \rangle \leq \|c^*\| \cdot \|c_k\| \leq \sqrt{k} \psi
 It follows, then, that the number of required iterations of the Perceptron algorithm has a finite upper bound, i.e.
-|kbound|
+<math>k\leq \left( \frac{\psi}{\gamma_z(c^*)}\right)^2</math>
+----
-.. |kbound| image:: tex
+[[Lecture_10_-_Batch_Perceptron_and_Fisher_Linear_Discriminant_OldKiwi|Back to Lecture 10]]
-   :alt: tex: k\leq \left( \frac{\psi}{\gamma_z(c^*)}\right)^2
-.. |zdef| image:: tex
-   :alt: tex: z=(x,y)\in\mathbb{Z}^m
-.. |z| image:: tex
-   :alt: tex: z
-.. |xiyiinz| image:: tex
-   :alt: tex: (x_i,y_i)\in z
-.. |gammaitildeDef| image:: tex
-   :alt: tex: \tilde{\gamma}_i = y_i\langle x_i,c \rangle
-.. |gammaztildeDef| image:: tex
-   :alt: tex: \tilde{\gamma}_z = \min_{(x_i,y_i)\in z} \tilde{\gamma}_i
-.. |gammaidef| image:: tex
-   :alt: tex: \gamma_i = \frac{\tilde{\gamma}_i(c)}{\|c\|}
-.. |gammazdef| image:: tex
-   :alt: tex: \gamma_z = \frac{\tilde{\gamma}_z(c)}{\|c\|}
-.. |psidef| image:: tex
-   :alt: tex: \psi = \max_{x_i\in x} \|\phi(x_i)\|
-.. |feature| image:: tex
-   :alt: tex: \phi(x_i)
-.. |xi| image:: tex
-   :alt: tex: x_i
-.. |stagekdef| image:: tex
-   :alt: tex: c_k = c_{k-1} + y_ix_i
-.. |ck| image:: tex
-   :alt: tex: c_k
-.. |k| image:: tex
-   :alt: tex: k

Difference between revisions of "Perceptron Convergence Theorem OldKiwi" - Rhea

Latest revision as of 06:37, 21 April 2013

Alumni Liaison