(=)
(==)
Line 10: Line 10:
  
 
Last class introduced [kernel functions] trick as a key to make [SVM] an effective tool for classifying linearly separable data.  Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.
 
Last class introduced [kernel functions] trick as a key to make [SVM] an effective tool for classifying linearly separable data.  Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.
 
====================
 
 
What is a Neural Network?
 
------------------------------------------
 
 
An [Artificial Neural Network] is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.
 
 
General Properties:
 
 
Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.
 
Other advantages include:
 
 
  1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
 
  2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
 
 
Neural networks are a family of function approximation techniques, when the function is approximated,
 
 
|NNF_1|
 
 
is modeled as a composition of simple functions |NNF_2|
 
 
|NNF_3|
 
 
The composition model is represented by a network
 
 
Several |NNF_2| are taken to be linear functions
 
 
The parameters of the linear functions are optimized to best fit the data
 
 
Example) [Linear Discriminant Functions] can be seen as a two layer Neural Network(NN)
 
 
recall |NNF_4|
 
 
|NNF_5|
 
 
write 
 
 
.. image:: x_bar.jpg
 
 
 
 
.. image:: NN_2layer_2.jpg
 
 
 
Example of three layer NN
 
 
.. image:: NN_3layer.JPG
 
 
Common types of function fi's
 
 
linear: |linear_fx|
 
 
.. |linear_fx| image:: tex
 
  :alt: tex: f(\vec x)=\vec c^T\vec x+c_0
 
 
logistic: |logistic_fx|
 
 
.. |logistic_fx| image:: tex
 
  :alt: tex: f(x)=\frac{e^x}{1+e^x}
 
 
threshold: |threshold_fx|
 
 
.. |threshold_fx| image:: tex
 
  :alt: tex: f(x)=1,x>0;f(x)=0,else
 
 
hyperbolic tangent: |hypertan_fx|
 
 
.. |hypertan_fx| image:: tex
 
  :alt: tex: f(x)=\frac{e^x-1}{e^x+1}
 
 
sign function: |sign_fx|
 
 
.. |sign_fx| image:: tex
 
  :alt: tex: f(x)=1,x>0;f(x)=-1,else
 
 
any continuous |gx_map_R|
 
 
.. |gx_map_R| image:: tex
 
  :alt: tex: g(\vec x):[0,1]*[0,1]*...*[0,1]\rightarrow\Re
 
 
can be written as :
 
 
|gx_composite|
 
 
.. |gx_composite| image:: tex
 
  :alt: tex: g(\vec x)=\sum_{j=1}^{2n+1}G_j(\sum_{i}\psi_i_j(x_i))
 
 
Training Neural Networks  - "Back-Propagation Algorithm"
 
---------------------------------------------------------
 
 
.. |w_vect| image:: tex
 
  :alt: tex: \vec{w}
 
 
.. |xk_vect| image:: tex
 
  :alt: tex: \vec{x_k}
 
 
.. |zk| image:: tex
 
  :alt: tex: z_k
 
 
.. |tk| image:: tex
 
  :alt: tex: t_k
 
 
First define a cost function to measure the error of the neural network with weights |w_vect|, say we have training input values |xk_vect| =>  output |zk|, but desire output |tk|.
 
 
This cost function can be written as below
 
 
|jinha_jw|
 
 
.. |jinha_jw| image:: tex
 
  :alt: tex: J(\vec{w}) = \frac{1}{2} \sum_{k} (t_k - z_k)^2 = \frac{1}{2} \mid \vec{t} - \vec{k} \mid ^2
 
 
Then, we can optimize this cost function using gradient descent method
 
 
.. |jinha_w| image:: tex
 
  :alt: tex: \vec{w}
 
 
new |jinha_w| = old |jinha_w| + |jinha_dw|
 
 
.. |jinha_dw| image:: tex
 
  :alt: tex: \Delta \vec{w}
 
 
|jinha_gd|
 
 
.. |jinha_gd| image:: tex
 
  :alt: tex: \rightarrow \vec{w}(k+1) = \vec{w}(k) - \eta(k) \left(  \frac{\partial J}{\partial w_1}, \frac{\partial J}{\partial w_2}, \cdots , \frac{\partial J}{\partial w_{last}}  \right)
 
 
 
Previous: [Lecture 12]; Next: [Lecture 14]
 

Revision as of 12:29, 17 March 2008

ECE662 Main Page

Class Lecture Notes

(continued from [Lecture 12])

[Kernel Functions]

===========

Main article: [Kernel Functions]

Last class introduced [kernel functions] trick as a key to make [SVM] an effective tool for classifying linearly separable data. Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.

Alumni Liaison

Questions/answers with a recent ECE grad

Ryne Rayburn