From Bayes Theorem to Pattern Recognition via Bayes Rule - Rhea

Bayes' Theorem and its Application to Pattern Recognition

History

Bayes' Theorem takes its name from the mathematician Thomas Bayes. For an accurate and detailed information about him, you might want to read his biography by Prof. D.R. Bellhouse \cite{Bellhouse}

Note

This tutorial assumes familiarity with the following--

The axioms of probability
Definition of conditional probability

Bayes' Theorem

Let us revisit conditional probability through an example and then gradually move onto Bayes' theorem.

Example

Problem: In Spring 2014, in the Computer Science (CS) Department of Purdue University, 200 students registered for the course CS180 (Problem Solving and Object Oriented Programming). 30% of the registered students are CS majors and the rest are non-majors. From the student registration data we observe that 80% of the CS majors are males, where as only 40% of non-majors are males. Find the following:

The probability that a randomly selected student is a CS major.
The probability that the selected student is a CS major and a male.
The probability that the selected student is a male.
Given that the selected student is a male what is the probability that he is a CS major? How is this different from the probability computed in part 1.

Solution:

Notation: Let $$ CS $$ be the event that the selected student is a computer science major and let $$ M $$ be the event that the selected student is a male. Therefore, we can define the four events shown below and, summarize the information given problem in the form of a table \cite{Triola} or in the form of a probability tree.

CS\equiv \text{CS major; } \overline{CS}\equiv \text{non-major; } M\equiv \text{male; } \overline{M}\equiv \text{female}

The elements of the table excluding the legends (or captions) can be considered as a 3x3 matrix. Let (1,1) represent the first cell of the matrix. The content of (1,1) is computed first, followed by content of (1,2) and then (1,3). The same is then done with the second and third rows of the matrix.

The probability tree in Figure \ref{fig:Probability tree} is drawn by considering events as sequential. The number of branches in the probability tree depends on the number of events (i.e., how much you know about the system). The numbers on the branches denote the conditional probabilities.

From the table, $\textbf{P}(CS) = \frac{\text{No. of CS majors}}{\text{Total no. of Students}} = \frac{60}{200} = 0.3$
This is nothing but the fraction of the total students who are CS majors.
From the table, x = (y² + 2). $\textbf{P}(CS\cap M) = \frac{\text{No. of CS majors who are also males}}{\text{Total no. of Students}} = \frac{48}{200} = 0.24$
Using the probability tree we can interpret $\textstyle(CS\cap M)$ as the occurrence of event \$CS\$ followed by the occurrence of event $M$. Therefore,\\ \$\textbf{P}(CS\cap M) = \textbf{P}(CS)\times\textbf{P}(M\vert CS) = 0.3\times0.8 = 0.24 \$ (multiplication rule)

Algebra of Sets

Union is commutative ⇔ A ∪ B = B ∪ A (proof)
Intersection is commutative ⇔ A ∩ B = B ∩ A (proof)
Union is associative ⇔ A ∪ (B ∪ C) = (A ∪ B) ∪ C (proof)
Intersection is associative ⇔ A ∩ (B ∩ C) = (A ∩ B) ∩ C (the proof is similar to the one for the associative property of the union)
Intersection is distributive over union ⇔ A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (proof)
Union is distributive over intersection ⇔ A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) (proof)
(A')' = A (proof)
De Morgan's Law: (A ∩ B)' = A' ∪ B' (proof)
De Morgan's Law: (A ∪ B)' = A' ∩ B' (proof)
S' = Ø, where S is the sample space (proof)
A ∩ S = A (proof)
A ∩ Ø = Ø (proof)
A ∪ S = S (proof)
A ∪ Ø = A (proof)
A ∪ A' = S (proof)
A ∩ A' = Ø (proof)

A Note on Sets of Different Sizes

We can categorize the sets we encounter three ways:

A set $$ A $$ is finite if it contains an finite number of elements, i.e. the number of elements in the set is some natural number $$ n $$ . Then we can list the elements in $$ A $$ ; e.g. $A = \{x_1,...,x_n\}$ .

A set is countable if its elements can be put into a one-to-one correspondence (a bijection) with the integers. In this course when, we say countable, we mean countably infinite. We may write $A=\{x_1,x_2,...\}$ . The set of rationals is a countable set.

A set is uncountable if it is not finite or countable. A set that is uncountable cannot be written as $\{x_1,x_2,...\}$ . Note that the set of reals as well as any interval in R is uncountable.

Note that a finite or countable space (a collection of sets) may contain elements that are uncountable. For example the set {Ø,[0,1]} is a finite set with 2 elements but the interval [0,1] is uncountable.

If you are interested to learn more about countable and uncountable sets, you may find this Math Squad tutorial useful.

We will often consider indexed collections of sets such as

\{A_i, i\in I\}

where $$ I $$ is called the index set.

The index set can be

finite: $I=\{1,...,n\}$ for some finite natural number n so that the collection of sets is $I=\{A_1,...,A_n\}$

countable: $$ I $$ is the set of natural numbers i.e. $I=\{1,2,3,...\}$ , so the collection is $\{A_1,A_2,A_3,...\}$

uncountable: so the collection is $\{A_{\alpha}, \alpha$ ∈ $I\}$ cfor an uncountable set $$ I $$ . If $$ I=R $$ , the set of reals, then the set in the collection can be written as $A_{\alpha}$ for some real number $\alpha$

Definition $\qquad$ The union of an indexed family of sets is defined as

\bigcup_{i \in I} A_i = \{\omega \in \mathcal{S}:\omega\in A_i \;\mbox{for at least one} \; i\in I \}

Note that if $$ I $$ is finite, we can write the union as

\bigcup_{i \in I}^n A_i

If $$ I $$ is countable, we can write the union as

\bigcup_{i \in I}^{\infty} A_i

If $$ I $$ is uncountable, we can write the union as

\bigcup_{i \in I} A_i

as in the definition.

Definition $\qquad$ The intersection of an indexed family of sets is defined as

\bigcap_{i \in I} A_i = \{\omega \in \mathcal{S}:\omega\in A_i \; \forall \; i\in I \}

Note that if $$ I $$ is finite, we can write the intersection as

\bigcap_{i \in I}^n A_i

If $$ I $$ is countable, we can write the intersection as

\bigcap_{i \in I}^{\infty} A_i

If $$ I $$ is uncountable, we can write the intersection as

\bigcap_{i \in I} A_i

.

Definition $\qquad$ $\{A_i, i$ ∈ $I\}$ is disjoint if $$ A_i $$ ∩ $$ A_j $$ = Ø ∀ i,j∈I, i≠j.

Definition $\qquad$ the collection $\{A_i, i$ ∈ $I\}$ is a partition of S, the sample space, if it is disjoint and if

\bigcup_{i \in I} A_i = \mathcal{S}

References

M. Comer. ECE 600. Class Lecture. Random Variables and Signals. Faculty of Electrical Engineering, Purdue University. Fall 2013.

Questions and comments

If you have any questions, comments, etc. please post them on this page

Back to all ECE 600 notes
Next Topic: Probability Spaces