Difference between revisions of "Lecture 13 - Kernel function for SVMs and ANNs introduction OldKiwi" - Rhea

Revision as of 12:29, 17 March 2008

(continued from [Lecture 12])

[Kernel Functions]

===========

Main article: [Kernel Functions]

Last class introduced [kernel functions] trick as a key to make [SVM] an effective tool for classifying linearly separable data. Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.

@@ Line 10: / Line 10: @@
 Last class introduced [kernel functions] trick as a key to make [SVM] an effective tool for classifying linearly separable data.  Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.
-====================
-What is a Neural Network?
-------------------------------------------
-An [Artificial Neural Network] is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.
-General Properties:
-Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.
-Other advantages include:
-. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
-. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
-Neural networks are a family of function approximation techniques, when the function is approximated,
-|NNF_1|
-is modeled as a composition of simple functions |NNF_2|
-|NNF_3|
-The composition model is represented by a network
-Several |NNF_2| are taken to be linear functions
-The parameters of the linear functions are optimized to best fit the data
-Example) [Linear Discriminant Functions] can be seen as a two layer Neural Network(NN)
-recall |NNF_4|
-|NNF_5|
-write
-.. image:: x_bar.jpg
-.. image:: NN_2layer_2.jpg
-Example of three layer NN
-.. image:: NN_3layer.JPG
-Common types of function fi's
-linear: |linear_fx|
-.. |linear_fx| image:: tex
-   :alt: tex: f(\vec x)=\vec c^T\vec x+c_0
-logistic: |logistic_fx|
-.. |logistic_fx| image:: tex
-   :alt: tex: f(x)=\frac{e^x}{1+e^x}
-threshold: |threshold_fx|
-.. |threshold_fx| image:: tex
-   :alt: tex: f(x)=1,x>0;f(x)=0,else
-hyperbolic tangent: |hypertan_fx|
-.. |hypertan_fx| image:: tex
-   :alt: tex: f(x)=\frac{e^x-1}{e^x+1}
-sign function: |sign_fx|
-.. |sign_fx| image:: tex
-   :alt: tex: f(x)=1,x>0;f(x)=-1,else
-any continuous |gx_map_R|
-.. |gx_map_R| image:: tex
-   :alt: tex: g(\vec x):[0,1]*[0,1]*...*[0,1]\rightarrow\Re
-can be written as :
-|gx_composite|
-.. |gx_composite| image:: tex
-   :alt: tex: g(\vec x)=\sum_{j=1}^{2n+1}G_j(\sum_{i}\psi_i_j(x_i))
-Training Neural Networks  - "Back-Propagation Algorithm"
----------------------------------------------------------
-.. |w_vect| image:: tex
-   :alt: tex: \vec{w}
-.. |xk_vect| image:: tex
-   :alt: tex: \vec{x_k}
-.. |zk| image:: tex
-   :alt: tex: z_k
-.. |tk| image:: tex
-   :alt: tex: t_k
-First define a cost function to measure the error of the neural network with weights |w_vect|, say we have training input values |xk_vect| =>  output |zk|, but desire output |tk|.
-This cost function can be written as below
-|jinha_jw|
-.. |jinha_jw| image:: tex
-   :alt: tex: J(\vec{w}) = \frac{1}{2} \sum_{k} (t_k - z_k)^2 = \frac{1}{2} \mid \vec{t} - \vec{k} \mid ^2
-Then, we can optimize this cost function using gradient descent method
-.. |jinha_w| image:: tex
-   :alt: tex: \vec{w}
-new |jinha_w| = old |jinha_w| + |jinha_dw|
-.. |jinha_dw| image:: tex
-   :alt: tex: \Delta \vec{w}
-|jinha_gd|
-.. |jinha_gd| image:: tex
-   :alt: tex: \rightarrow \vec{w}(k+1) = \vec{w}(k) - \eta(k) \left(  \frac{\partial J}{\partial w_1}, \frac{\partial J}{\partial w_2}, \cdots , \frac{\partial J}{\partial w_{last}}  \right)
-Previous: [Lecture 12]; Next: [Lecture 14]

Difference between revisions of "Lecture 13 - Kernel function for SVMs and ANNs introduction OldKiwi" - Rhea

Revision as of 12:29, 17 March 2008

===========

Alumni Liaison