(Kernel Functions)
(Kernel Functions)
Line 10: Line 10:
 
Last class introduced kernel functions trick as a key to make SVM an effective tool for classifying linearly separable data.  Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.
 
Last class introduced kernel functions trick as a key to make SVM an effective tool for classifying linearly separable data.  Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.
  
Note that  
+
Note that
<math> \varphi</math>
+
<math> \varphi</math>can be a mapping as
can be a mapping as
+
 
<math>\varphi:\Re^k\rightarrow\mathbb{H}
 
<math>\varphi:\Re^k\rightarrow\mathbb{H}
 
</math>
 
</math>
  
where <math>\mathbb{H}</math> can be <math>\infty</math> dimensional.  
+
where <math>\mathbb{H}</math> can be <math>\infty</math> dimensional.
  
 
.. |gaussian_kernel1| image:: tex
 
.. |gaussian_kernel1| image:: tex
  :alt: tex: k(\vec{x},\vec{x'})=e^{-\frac{||\vec{x}-\vec{x}||^2}{2\sigma^2}}
+
:alt: tex: k(\vec{x},\vec{x'})=e^{-\frac{||\vec{x}-\vec{x}||^2}{2\sigma^2}}
  
 
.. |sigma1| image:: tex
 
.. |sigma1| image:: tex
  :alt: tex: \sigma
+
:alt: tex: \sigma
  
 
Here is the example of a "Gaussian Kernel" with |bigH| as |infinity1| dimensional.
 
Here is the example of a "Gaussian Kernel" with |bigH| as |infinity1| dimensional.
Line 29: Line 28:
  
 
.. |kernel_notation1| image:: tex
 
.. |kernel_notation1| image:: tex
  :alt: tex: k(\vec{x},\vec{x'})
+
:alt: tex: k(\vec{x},\vec{x'})
  
 
.. |phi_notation1| image:: tex
 
.. |phi_notation1| image:: tex
  :alt: tex: \varphi(\vec{x})
+
:alt: tex: \varphi(\vec{x})
  
It is easier to work with |kernel_notation1| than with |phi_notation1|.  
+
It is easier to work with |kernel_notation1| than with |phi_notation1|.
  
 
.. |khh13| image:: tex
 
.. |khh13| image:: tex
  :alt: tex: \varphi(\vec{x}) \cdot \varphi(\vec{x'})
+
:alt: tex: \varphi(\vec{x}) \cdot \varphi(\vec{x'})
  
 
In this example, computation of the dot product |khh13| requires infinite summation.  Kernel function allows us to compute distance to hyperplane with same computational cost as training [SVM] in initial data space.
 
In this example, computation of the dot product |khh13| requires infinite summation.  Kernel function allows us to compute distance to hyperplane with same computational cost as training [SVM] in initial data space.
Line 46: Line 45:
  
 
.. |khh14| image:: tex
 
.. |khh14| image:: tex
  :alt: tex: K:\Re ^k \times \Re ^k \rightarrow \Re
+
:alt: tex: K:\Re ^k \times \Re ^k \rightarrow \Re
  
 
.. |khh15| image:: tex
 
.. |khh15| image:: tex
  :alt: tex: \varphi
+
:alt: tex: \varphi
  
 
.. |expansion| image:: tex
 
.. |expansion| image:: tex
  :alt: tex: k(\vec{x},\vec{x'})=\sum_{i}\varphi(\vec{x})_{i}\varphi(\vec{x'})_{i}
+
:alt: tex: k(\vec{x},\vec{x'})=\sum_{i}\varphi(\vec{x})_{i}\varphi(\vec{x'})_{i}
  
 
Given a kernel |khh14|, there exists a |khh15| and an expansion
 
Given a kernel |khh14|, there exists a |khh15| and an expansion
Line 59: Line 58:
  
 
.. |rhs_part1| image:: tex
 
.. |rhs_part1| image:: tex
  :alt: tex: \Longleftrightarrow \forall g(\vec{x})  
+
:alt: tex: \Longleftrightarrow \forall g(\vec{x})
  
 
..  |rhs_part2| image:: tex
 
..  |rhs_part2| image:: tex
  :alt: tex: \int [g(\vec{x})]^{2}d\vec{x}<\infty
+
:alt: tex: \int [g(\vec{x})]^{2}d\vec{x}<\infty
  
 
.. |rhs_part3| image:: tex
 
.. |rhs_part3| image:: tex
  :alt: tex: \int\int k(\vec{x},\vec{x'})g(\vec{x})g(\vec{x'})d\vec{x}d\vec{x'}\geq 0
+
:alt: tex: \int\int k(\vec{x},\vec{x'})g(\vec{x})g(\vec{x'})d\vec{x}d\vec{x'}\geq 0
  
 
|rhs_part1| |rhs_part2|
 
|rhs_part1| |rhs_part2|
Line 73: Line 72:
  
 
.. |polynomial_kernel| image:: tex
 
.. |polynomial_kernel| image:: tex
  :alt: tex: k(\vec{x},\vec{x'})=||\vec{x}-\vec{x'}||^p
+
:alt: tex: k(\vec{x},\vec{x'})=||\vec{x}-\vec{x'}||^p
  
 
.. |sample_p| image:: tex
 
.. |sample_p| image:: tex
  :alt: tex: p\in \mathbb{N}
+
:alt: tex: p\in \mathbb{N}
  
 
This condition is satisfied for |polynomial_kernel| for any |sample_p|.
 
This condition is satisfied for |polynomial_kernel| for any |sample_p|.
Line 83: Line 82:
  
 
.. |polynomial_phi1| image:: tex
 
.. |polynomial_phi1| image:: tex
  :alt: tex: \varphi(\vec{x})=(\varphi_{r_1r_2\ldots r_{d_L}}(\vec{x}))
+
:alt: tex: \varphi(\vec{x})=(\varphi_{r_1r_2\ldots r_{d_L}}(\vec{x}))
  
 
.. |polynomial_phi2| image:: tex
 
.. |polynomial_phi2| image:: tex
  :alt: tex: \varphi_{r_1r_2\ldots r_{d_L}}(\vec{x})=(\sqrt{\frac{p!}{r_{1}!\ldots r_{d_L}!}}){x_1}^{r_1}\ldots {x_{d_L}}^{r_{d_L}}
+
:alt: tex: \varphi_{r_1r_2\ldots r_{d_L}}(\vec{x})=(\sqrt{\frac{p!}{r_{1}!\ldots r_{d_L}!}}){x_1}^{r_1}\ldots {x_{d_L}}^{r_{d_L}}
  
 
.. |polynomial_phi3| image:: tex
 
.. |polynomial_phi3| image:: tex
  :alt: tex: \sum_{i=1}^{d_L}r_i=p
+
:alt: tex: \sum_{i=1}^{d_L}r_i=p
  
 
.. |polynomial_phi4| image:: tex
 
.. |polynomial_phi4| image:: tex
  :alt: tex: r_i\geq 0
+
:alt: tex: r_i\geq 0
  
e.g. |polynomial_phi1| where |polynomial_phi2|  
+
e.g. |polynomial_phi1| where |polynomial_phi2|
  
 
and
 
and
Line 101: Line 100:
  
 
.. |ex_1| image:: tex
 
.. |ex_1| image:: tex
  :alt: tex: p(x,y)=7x^2-14y^2+3xy
+
:alt: tex: p(x,y)=7x^2-14y^2+3xy
  
 
**Example** :  |ex_1|
 
**Example** :  |ex_1|
  
 
.. |ex_11| image:: tex
 
.. |ex_11| image:: tex
  :alt: tex: p(x,y)=0
+
:alt: tex: p(x,y)=0
  
 
To visualize the separation surface we need to find x and y such that:
 
To visualize the separation surface we need to find x and y such that:
Line 115: Line 114:
  
 
.. image:: mortiz_lec13.gif
 
.. image:: mortiz_lec13.gif
  :align: center
+
:align: center
  
  
  
 
.. |NNF_1| image:: tex
 
.. |NNF_1| image:: tex
  :alt: tex: f:x \rightarrow  z
+
:alt: tex: f:x \rightarrow  z
  
 
.. |NNF_2| image:: tex
 
.. |NNF_2| image:: tex
  :alt: tex:f_i's
+
:alt: tex:f_i's
  
 
.. |NNF_3| image:: tex
 
.. |NNF_3| image:: tex
  :alt: tex: f=f_n\bullet f_{n-1}\cdot\cdot\cdot f_1
+
:alt: tex: f=f_n\bullet f_{n-1}\cdot\cdot\cdot f_1
  
 
.. |NNF_4| image:: tex
 
.. |NNF_4| image:: tex
  :alt: tex: g(\vec{x})=\vec{c}\cdot  (1,\vec{x})
+
:alt: tex: g(\vec{x})=\vec{c}\cdot  (1,\vec{x})
  
 
.. |NNF_5| image:: tex
 
.. |NNF_5| image:: tex
  :alt: tex: g(\vec{x})  > 0 \Rightarrow class 1 ,  < 0 \Rightarrow class 2  
+
:alt: tex: g(\vec{x})  > 0 \Rightarrow class 1 ,  < 0 \Rightarrow class 2
+
 
 
.. |NNF_6| image:: tex
 
.. |NNF_6| image:: tex
  :alt: tex: \vec x = \begin{pmatrix} x_1 & x_2 & x_3 & x_n \end{pmatrix}
+
:alt: tex: \vec x = \begin{pmatrix} x_1 & x_2 & x_3 & x_n \end{pmatrix}
+
 
 
Artificial Neural Networks
 
Artificial Neural Networks
 
====================
 
====================
Line 143: Line 142:
 
------------------------------------------
 
------------------------------------------
  
An [Artificial Neural Network] is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.  
+
An [Artificial Neural Network] is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.
  
 
General Properties:
 
General Properties:
Line 150: Line 149:
 
Other advantages include:
 
Other advantages include:
  
  1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
+
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
  2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
+
2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.
  
 
Neural networks are a family of function approximation techniques, when the function is approximated,
 
Neural networks are a family of function approximation techniques, when the function is approximated,
Line 173: Line 172:
 
|NNF_5|
 
|NNF_5|
  
write
+
write
  
 
.. image:: x_bar.jpg
 
.. image:: x_bar.jpg
Line 188: Line 187:
 
Common types of function fi's
 
Common types of function fi's
  
linear: |linear_fx|  
+
linear: |linear_fx|
  
 
.. |linear_fx| image:: tex
 
.. |linear_fx| image:: tex
  :alt: tex: f(\vec x)=\vec c^T\vec x+c_0
+
:alt: tex: f(\vec x)=\vec c^T\vec x+c_0
  
logistic: |logistic_fx|  
+
logistic: |logistic_fx|
  
 
.. |logistic_fx| image:: tex
 
.. |logistic_fx| image:: tex
  :alt: tex: f(x)=\frac{e^x}{1+e^x}
+
:alt: tex: f(x)=\frac{e^x}{1+e^x}
  
threshold: |threshold_fx|  
+
threshold: |threshold_fx|
  
 
.. |threshold_fx| image:: tex
 
.. |threshold_fx| image:: tex
  :alt: tex: f(x)=1,x>0;f(x)=0,else
+
:alt: tex: f(x)=1,x>0;f(x)=0,else
  
hyperbolic tangent: |hypertan_fx|  
+
hyperbolic tangent: |hypertan_fx|
  
 
.. |hypertan_fx| image:: tex
 
.. |hypertan_fx| image:: tex
  :alt: tex: f(x)=\frac{e^x-1}{e^x+1}
+
:alt: tex: f(x)=\frac{e^x-1}{e^x+1}
  
 
sign function: |sign_fx|
 
sign function: |sign_fx|
  
 
.. |sign_fx| image:: tex
 
.. |sign_fx| image:: tex
  :alt: tex: f(x)=1,x>0;f(x)=-1,else
+
:alt: tex: f(x)=1,x>0;f(x)=-1,else
  
 
any continuous |gx_map_R|
 
any continuous |gx_map_R|
  
 
.. |gx_map_R| image:: tex
 
.. |gx_map_R| image:: tex
  :alt: tex: g(\vec x):[0,1]*[0,1]*...*[0,1]\rightarrow\Re
+
:alt: tex: g(\vec x):[0,1]*[0,1]*...*[0,1]\rightarrow\Re
  
 
can be written as :
 
can be written as :
Line 223: Line 222:
  
 
.. |gx_composite| image:: tex
 
.. |gx_composite| image:: tex
  :alt: tex: g(\vec x)=\sum_{j=1}^{2n+1}G_j(\sum_{i}\psi_i_j(x_i))
+
:alt: tex: g(\vec x)=\sum_{j=1}^{2n+1}G_j(\sum_{i}\psi_i_j(x_i))
  
 
Training Neural Networks  - "Back-Propagation Algorithm"
 
Training Neural Networks  - "Back-Propagation Algorithm"
Line 229: Line 228:
  
 
.. |w_vect| image:: tex
 
.. |w_vect| image:: tex
  :alt: tex: \vec{w}
+
:alt: tex: \vec{w}
  
 
.. |xk_vect| image:: tex
 
.. |xk_vect| image:: tex
  :alt: tex: \vec{x_k}
+
:alt: tex: \vec{x_k}
  
 
.. |zk| image:: tex
 
.. |zk| image:: tex
  :alt: tex: z_k
+
:alt: tex: z_k
  
 
.. |tk| image:: tex
 
.. |tk| image:: tex
  :alt: tex: t_k
+
:alt: tex: t_k
  
 
First define a cost function to measure the error of the neural network with weights |w_vect|, say we have training input values |xk_vect| =>  output |zk|, but desire output |tk|.
 
First define a cost function to measure the error of the neural network with weights |w_vect|, say we have training input values |xk_vect| =>  output |zk|, but desire output |tk|.
Line 247: Line 246:
  
 
.. |jinha_jw| image:: tex
 
.. |jinha_jw| image:: tex
  :alt: tex: J(\vec{w}) = \frac{1}{2} \sum_{k} (t_k - z_k)^2 = \frac{1}{2} \mid \vec{t} - \vec{k} \mid ^2
+
:alt: tex: J(\vec{w}) = \frac{1}{2} \sum_{k} (t_k - z_k)^2 = \frac{1}{2} \mid \vec{t} - \vec{k} \mid ^2
  
 
Then, we can optimize this cost function using gradient descent method
 
Then, we can optimize this cost function using gradient descent method
  
 
.. |jinha_w| image:: tex
 
.. |jinha_w| image:: tex
  :alt: tex: \vec{w}
+
:alt: tex: \vec{w}
  
 
new |jinha_w| = old |jinha_w| + |jinha_dw|
 
new |jinha_w| = old |jinha_w| + |jinha_dw|
  
 
.. |jinha_dw| image:: tex
 
.. |jinha_dw| image:: tex
  :alt: tex: \Delta \vec{w}
+
:alt: tex: \Delta \vec{w}
  
 
|jinha_gd|
 
|jinha_gd|
  
 
.. |jinha_gd| image:: tex
 
.. |jinha_gd| image:: tex
  :alt: tex: \rightarrow \vec{w}(k+1) = \vec{w}(k) - \eta(k) \left(  \frac{\partial J}{\partial w_1}, \frac{\partial J}{\partial w_2}, \cdots , \frac{\partial J}{\partial w_{last}}  \right)
+
:alt: tex: \rightarrow \vec{w}(k+1) = \vec{w}(k) - \eta(k) \left(  \frac{\partial J}{\partial w_1}, \frac{\partial J}{\partial w_2}, \cdots , \frac{\partial J}{\partial w_{last}}  \right)
  
  

Revision as of 10:55, 23 March 2008

ECE662 Main Page

Class Lecture Notes

(continued from [Lecture 12])

[Kernel Functions]

Kernel Functions

Last class introduced kernel functions trick as a key to make SVM an effective tool for classifying linearly separable data. Here we see some examples of kernel functions, and the condition that determined if these functions correspond to dot product in some feature space.

Note that $ \varphi $can be a mapping as $ \varphi:\Re^k\rightarrow\mathbb{H} $

where $ \mathbb{H} $ can be $ \infty $ dimensional.

.. |gaussian_kernel1| image:: tex

alt: tex: k(\vec{x},\vec{x'})=e^{-\frac{||\vec{x}-\vec{x}||^2}{2\sigma^2}}

.. |sigma1| image:: tex

alt: tex: \sigma

Here is the example of a "Gaussian Kernel" with |bigH| as |infinity1| dimensional.

|gaussian_kernel1|, |sigma1| parameter.

.. |kernel_notation1| image:: tex

alt: tex: k(\vec{x},\vec{x'})

.. |phi_notation1| image:: tex

alt: tex: \varphi(\vec{x})

It is easier to work with |kernel_notation1| than with |phi_notation1|.

.. |khh13| image:: tex

alt: tex: \varphi(\vec{x}) \cdot \varphi(\vec{x'})

In this example, computation of the dot product |khh13| requires infinite summation. Kernel function allows us to compute distance to hyperplane with same computational cost as training [SVM] in initial data space.

    • For which kernel does there exist a mapping to a higher dimensional space?**

The answer lies in **[Mercer's Condition]** (Covrant and Hilbert in '53; Vapnik in '95)

.. |khh14| image:: tex

alt: tex: K:\Re ^k \times \Re ^k \rightarrow \Re

.. |khh15| image:: tex

alt: tex: \varphi

.. |expansion| image:: tex

alt: tex: k(\vec{x},\vec{x'})=\sum_{i}\varphi(\vec{x})_{i}\varphi(\vec{x'})_{i}

Given a kernel |khh14|, there exists a |khh15| and an expansion

|expansion|, where i could range in infinite space

.. |rhs_part1| image:: tex

alt: tex: \Longleftrightarrow \forall g(\vec{x})

.. |rhs_part2| image:: tex

alt: tex: \int [g(\vec{x})]^{2}d\vec{x}<\infty

.. |rhs_part3| image:: tex

alt: tex: \int\int k(\vec{x},\vec{x'})g(\vec{x})g(\vec{x'})d\vec{x}d\vec{x'}\geq 0

|rhs_part1| |rhs_part2|

|rhs_part3|


.. |polynomial_kernel| image:: tex

alt: tex: k(\vec{x},\vec{x'})=||\vec{x}-\vec{x'}||^p

.. |sample_p| image:: tex

alt: tex: p\in \mathbb{N}

This condition is satisfied for |polynomial_kernel| for any |sample_p|.

In this case, |khh15| is a polynomial mapping, homogeneous with degree p in each component.

.. |polynomial_phi1| image:: tex

alt: tex: \varphi(\vec{x})=(\varphi_{r_1r_2\ldots r_{d_L}}(\vec{x}))

.. |polynomial_phi2| image:: tex

alt: tex: \varphi_{r_1r_2\ldots r_{d_L}}(\vec{x})=(\sqrt{\frac{p!}{r_{1}!\ldots r_{d_L}!}}){x_1}^{r_1}\ldots {x_{d_L}}^{r_{d_L}}

.. |polynomial_phi3| image:: tex

alt: tex: \sum_{i=1}^{d_L}r_i=p

.. |polynomial_phi4| image:: tex

alt: tex: r_i\geq 0

e.g. |polynomial_phi1| where |polynomial_phi2|

and

|polynomial_phi3|, |polynomial_phi4|

.. |ex_1| image:: tex

alt: tex: p(x,y)=7x^2-14y^2+3xy
    • Example** : |ex_1|

.. |ex_11| image:: tex

alt: tex: p(x,y)=0

To visualize the separation surface we need to find x and y such that:

|ex_11|

To solve such equation, we could take a segment of y and divide it on intervals. On each interval we fix a value of y and solve the quadratic function for x. Then, we connect the resulting points to see the surface. This example is illustrated on the figure below this paragraph.

.. image:: mortiz_lec13.gif

align: center


.. |NNF_1| image:: tex

alt: tex: f:x \rightarrow z

.. |NNF_2| image:: tex

alt: tex:f_i's

.. |NNF_3| image:: tex

alt: tex: f=f_n\bullet f_{n-1}\cdot\cdot\cdot f_1

.. |NNF_4| image:: tex

alt: tex: g(\vec{x})=\vec{c}\cdot (1,\vec{x})

.. |NNF_5| image:: tex

alt: tex: g(\vec{x}) > 0 \Rightarrow class 1 , < 0 \Rightarrow class 2

.. |NNF_6| image:: tex

alt: tex: \vec x = \begin{pmatrix} x_1 & x_2 & x_3 & x_n \end{pmatrix}

Artificial Neural Networks

========

What is a Neural Network?


An [Artificial Neural Network] is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.

General Properties:

Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions. Other advantages include:

1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. 2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.

Neural networks are a family of function approximation techniques, when the function is approximated,

|NNF_1|

is modeled as a composition of simple functions |NNF_2|

|NNF_3|

The composition model is represented by a network

Several |NNF_2| are taken to be linear functions

The parameters of the linear functions are optimized to best fit the data

Example) [Linear Discriminant Functions] can be seen as a two layer Neural Network(NN)

recall |NNF_4|

|NNF_5|

write

.. image:: x_bar.jpg


.. image:: NN_2layer_2.jpg


Example of three layer NN

.. image:: NN_3layer.JPG

Common types of function fi's

linear: |linear_fx|

.. |linear_fx| image:: tex

alt: tex: f(\vec x)=\vec c^T\vec x+c_0

logistic: |logistic_fx|

.. |logistic_fx| image:: tex

alt: tex: f(x)=\frac{e^x}{1+e^x}

threshold: |threshold_fx|

.. |threshold_fx| image:: tex

alt: tex: f(x)=1,x>0;f(x)=0,else

hyperbolic tangent: |hypertan_fx|

.. |hypertan_fx| image:: tex

alt: tex: f(x)=\frac{e^x-1}{e^x+1}

sign function: |sign_fx|

.. |sign_fx| image:: tex

alt: tex: f(x)=1,x>0;f(x)=-1,else

any continuous |gx_map_R|

.. |gx_map_R| image:: tex

alt: tex: g(\vec x):[0,1]*[0,1]*...*[0,1]\rightarrow\Re

can be written as :

|gx_composite|

.. |gx_composite| image:: tex

alt: tex: g(\vec x)=\sum_{j=1}^{2n+1}G_j(\sum_{i}\psi_i_j(x_i))

Training Neural Networks - "Back-Propagation Algorithm"


.. |w_vect| image:: tex

alt: tex: \vec{w}

.. |xk_vect| image:: tex

alt: tex: \vec{x_k}

.. |zk| image:: tex

alt: tex: z_k

.. |tk| image:: tex

alt: tex: t_k

First define a cost function to measure the error of the neural network with weights |w_vect|, say we have training input values |xk_vect| => output |zk|, but desire output |tk|.

This cost function can be written as below

|jinha_jw|

.. |jinha_jw| image:: tex

alt: tex: J(\vec{w}) = \frac{1}{2} \sum_{k} (t_k - z_k)^2 = \frac{1}{2} \mid \vec{t} - \vec{k} \mid ^2

Then, we can optimize this cost function using gradient descent method

.. |jinha_w| image:: tex

alt: tex: \vec{w}

new |jinha_w| = old |jinha_w| + |jinha_dw|

.. |jinha_dw| image:: tex

alt: tex: \Delta \vec{w}

|jinha_gd|

.. |jinha_gd| image:: tex

alt: tex: \rightarrow \vec{w}(k+1) = \vec{w}(k) - \eta(k) \left( \frac{\partial J}{\partial w_1}, \frac{\partial J}{\partial w_2}, \cdots , \frac{\partial J}{\partial w_{last}} \right)


Previous: [Lecture 12]; Next: [Lecture 14]

Lectures

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Alumni Liaison

Questions/answers with a recent ECE grad

Ryne Rayburn