ECE662: Statistical Pattern Recognition and Decision Making Processes

Spring 2008, Prof. Boutin

Collectively created by the students in the class

Lecture 4 Lecture notes

Jump to: Outline| 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27| 28

Bayes decision rule for continuous features

Let $\mathbf{x} = \left[ x_1, x_2, \cdots,x_n \right] ^{\mathbf{T}}$ be a random vector taking values in $\Re^{n}$. X is characterized by its pdf (probability density function) and cdf (cumulative distribution function), or simply probability distribution function.

The cumulative distribution function or CDF is defined as: $P({x}) = P(x_1,\cdots,x_n) = Pr\{x_1 \le X_1, \cdots, x_n \le X_n\}$

The probability density function or pdf is defined as: $p({x}) = p(x_1,\cdots , x_n) = \displaystyle \lim_{\Delta x_i \rightarrow 0 , \forall i }{\frac{Pr\{x_1 \le X_1 \le x_1+ \Delta x_1, \cdots, x_n \le X_n \le x_n+ \Delta x_n\}}{\Delta x_1 \Delta x_2 \cdots \Delta x_n} }$

and Each class $\omega_1, \cdots, \omega_k$ has its "conditional density"

$p(x|w_i), i =1,\ldots,K$

Each $p(x|w_i)$ is called "class i density" in contrast to the "unconditional density function of x", also called "mixture density of x" given by:

$\displaystyle p({x}) = \sum_{k=1}^{K}P(w_i)p(x|w_i)$

Addendum to the lecture -- Since the classes $\omega_i$ are discrete, P($\omega_i$) is not the Probability Distribution Function or CDF of $\omega_i$. Rather, it is the Probability Mass Function or pmf. Refer to duda and hart, page 21.

Bayes Theorem:

$p(w_i|{x}) = \frac{\displaystyle p(x|w_i)P(w_i)}{\displaystyle {\sum_{k=1}^{K}p(x|w_k)P(w_k)}}$

Bayes Rule: Given X=x, decide $\omega_i$ if

$p(w_i|x) \ge p(w_j|x), \forall j$

$\Longleftrightarrow p(x|w_i) \frac{\displaystyle P(w_i)}{\displaystyle \sum_{k=1}^{K}p(x|w_k)P(w_k)} \ge p(x|w_j) \frac{\displaystyle P(w_j)}{\displaystyle \sum_{k=1}^{K}p(x|w_k)P(w_k)} , \forall j$

The Bayes rules to minimize the expected loss([Loss Functions]) or "Risk":

- We consider a slightly more general setting of k+2 classes:

$w_1, \cdots, w_k$, D, O, where D="doubt class" and O="outlier/other class"

Let $L(w_l|w_k)$ be the loss incurred by deciding class $w_l$ when the class is $w_k$

Usually, $L(w_k|w_k)=0, \forall k$

If every misclassification is equally bad we define:

$L(w_l|w_k)= \{ { 0, \quad l=k, correct, \quad 1, \quad l \neq k, incorrect} \}$

We could also include the cost of doubting:

$L(w_l|w_k)= \{ {0, \quad l=k; \quad 1, \quad l \neq k; \quad d, \quad w_l=D} \}$

Example: Two classes of fish in a lake: trout and catfish

$L(trout|catfish) = \$2  L(catfish|trout) = \$3$

$\displaystyle L(trout|trout) = 0$

$\displaystyle L(catfish|catfish)= 0$

Experiments and notes

Previous: Lecture 3 Next: Lecture 5