Revision as of 21:18, 30 April 2014 by Lee714 (Talk | contribs)


Neyman-Pearson Lemma and Receiver Operating Characteristic Curve

A slecture by ECE student Soonam Lee

Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

click here for PDF version


Introduction

The purpose of this slecture is understanding Neyman-Pearson Lemma and Receiver Operating Characteristic (ROC) curve from theory to application. The fundamental theories stem from statistics and these can be used for signal detection and classification. In order to firmly understand these concepts, we will review statistical concept first including False alarm and Miss. After that, we will consider Bayes' decision rule using cost function. Next, we visit Neyman-Pearson test and minimax test. Lastly, we will discuss ROC curves. Note that we only consider two classes case in this slecture, but the concept can be extended to multiple classes.


The General Two Classes Case Problem

Before starting our discussion, we have the following setup:

  • $ X $ a measure random variable, random vector, or random process
  • $ x \in \mathcal{X} $ is a realization of $ X $
  • $ \theta \in \Theta $ are unknown parameters
  • $ f(x; \theta) $ is pdf of $ X $ (a known function)

Two distinct hypotheses on $ \theta $ such that $ H_0: \theta \in \Theta_0 $ versus $ H_1: \theta \in \Theta_1 $ where $ \Theta_0 $, $ \Theta_1 $ is partition of $ \Theta $ into two disjoint regions

$ \Theta_0 \cup \Theta_1 = \Theta, \qquad \Theta_0 \cap \Theta_1 = \phi $

False Alarm Rate and Miss Rate

From statistical references~\cite{bib:Casella2001, bib:Ghosh_note2012}, these two hypotheses make one of two types of error, named Type I Error and Type II Error. If $\theta$ $\in$ $\Theta_0$ but the hypothesis test incorrectly decides to reject $H_0$, then the test has made a \emph{Type I Error}. If, on the other hand, $\theta$ $\in$ $\Theta_1$ but the test decides to accept $H_0$ a \emph{Type II Error} has been made.

\begin{table}[htbp!] \centering \begin{tabular}{ c c | c | c } \hline & & \multicolumn{2}{c}{Decision} \\ % \cline{3-4} & & Accept $H_0$ & Reject $H_0$ \\

                 \hline

\multirow{2}{*}{Truth} & $H_0$ & Correct Decision & Type I Error \\ \cline{3-4} & $H_1$ & Type II Error & Correct Decision \\ \hline \end{tabular} \caption{Two types of errors in hypothesis testing} \label{tab:type_error} \end{table} \emph{Type I Error} is often called false positive error and \emph{Type II Error} is called false negative error. These two types are depicted in Table~\ref{tab:type_error}.

Different from statistical perspective, engineers focus more on false alarm rate and miss errors rate. \emph{False alarm} rate is the same as \emph{Type I Error} such that a given condition has been fulfilled, when it actually has not been fulfilled, that is, erroneously a positive effect has been assumed. \emph{Miss} rate is such that a given condition has not been fulfilled, when it actually has been fulfilled. In general, to compute \emph{miss} rate, computing \emph{hit} rate and subtract \emph{hit} rate from 1 to get \emph{miss} rate.

Let's assume a test function $\phi(x)$ such that \begin{eqnarray*} \phi(x) = \left\{

 \begin{array}{l l}
  1 & \quad \text{decide $H_1$}\\
  0 & \quad \text{decide $H_0$}\\
  \end{array} \right. .

\end{eqnarray*} The test function $\phi(x)$ maps $\mathcal{X}$ to the decision space $\{$0, 1$\}$ for deciding $H_0$ and $H_1$. The function $\phi(x)$ induces a partition of $\mathcal{X}$ into decision regions. Denote $\mathcal{X}_0 = \{x: \phi(x) = 0 \}$ and $\mathcal{X}_1 = \{x: \phi(x) = 1 \}$. Then, \emph{false alarm} and \emph{miss} probabilities with the function $\phi$ can be expressed as follow: \begin{eqnarray*} P_F (\theta) = E_\theta [\phi] &=& \int_\mathcal{X} \phi(x) f(x; \theta) dx, \quad \theta \in \Theta_0 \\ &=& \int_{\mathcal{X}_1} f(x|\theta) dx, \quad \theta \in \Theta_0 \\ P_M (\theta) = E_\theta [1 - \phi] &=& \int_\mathcal{X} [1 - \phi(x)] f(x; \theta) dx, \quad \theta \in \Theta_1 \\ &=& 1 - \int_{\mathcal{X}_1} f(x|\theta) dx, \quad \theta \in \Theta_1. \end{eqnarray*} The probability of correctly deciding $H_1$ is called \emph{hit} probability such that \begin{eqnarray*} 1- P_M (\theta) = P_D(\theta) = E_\theta [\phi], \quad \theta \in \Theta_1. \end{eqnarray*}

From the ECE 662 class note~\cite{bib:Boutin_note2014}, Professor introduced error 1 ($\epsilon_1$) and error 2 ($\epsilon_2$) that are basically same concepts as \emph{miss} rate and \emph{false alarm} rate respectively. The only difference is representing the expectation values as fraction as follow: \begin{eqnarray*} \epsilon_1 &=& \frac{\text{\# of test data points labeled as class $\Theta_0$ misclassified as class $\Theta_1$}}{\text{\# test data points in class $\Theta_0$}} \\ \epsilon_2 &=& \frac{\text{\# of test data points labeled as class $\Theta_1$ misclassified as class $\Theta_0$}}{\text{\# test data points in class $\Theta_1$}}. \end{eqnarray*}


Working now...

Post your slecture material here. Guidelines:

  • If you are making a text slecture
    • Type text using wikitext markup languages
    • Type all equations using latex code between <math> </math> tags.
    • You may include links to other Project Rhea pages.





References

[1] G. Casella and R. L. Berger, Statistical inference (2nd Edition), Cengage Learning, 2001.
[2] J. K. Ghosh, \STAT528: Introduction to Mathematical Statistics," Fall 2012, Purdue University.
[3] M. Boutin, \ECE662: Statistical Pattern Recognition and Decision Making Process," Spring 2014, Purdue University.

Questions and comments

If you have any questions, comments, etc. please post them on this page.


Back to ECE662, Spring 2014

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang