Difference between revisions of "Bayes theorem S13" - Rhea

Latest revision as of 13:08, 25 November 2013

Bayes' Theorem

by Maliha Hossain, proud Member of the Math Squad.

keyword: probability, Bayes' Theorem, Bayes' Rule

INTRODUCTION

Bayes' Theorem (or Bayes' Rule) allows us to calculate P(A|B) from P(B|A) given that P(A) and P(B) are also known. In this tutorial, we will derive Bayes' Theorem and illustrate it with a few examples. After going over the examples, if you have any questions or if you find any mistakes please leave me a comment at the end of the relevant section.

Note that this tutorial assumes familiarity with conditional probability and the axioms of probability. If you interested in the derivation of the conditional distributions for continuous and discrete random variables, you may wish to go over Professor Mary Comer's notes on the subject.

 Contents
- Bayes' Theorem
- Proof
- Example Problems
- References

Bayes' Theorem

Let $$ B_1, B_2, ..., B_n $$ be a partition of the sample space $$ S $$ , i.e. $$ B_1, B_2, ..., B_n $$ are mutually exclusive events whose union equals the sample space S. Suppose that the event $$ A $$ occurs. Then, by Bayes' Theorem, we have that

$P[B_j|A] = \frac{P[A|B_j]P[B_j]}{P[A]}, j = 1, 2, . . . , n$

Bayes' Theorem is also often expressed in the following form:

$P[B_j|A] = \frac{P[A|B_j]P[B_j]}{\sum_{k=1}^n P[A|B_k]P[B_k]}$

Proof

We will now derive Bayes' Theorem as it is expressed in the second form, which simply takes the expression one step further than the first.

Let $$ A $$ and $$ B_j $$ be as defined above. By definition of the conditional probability, we have that

$P[A|B_j] = \frac{P[A\cap B_j]}{P[B_j]}$

Multiplying both sides with $$ B_j $$ , we get

$P[A\cap B_j] = P[A|B_j]P[B_j] \$

Using the same argument as above, we have that

$\begin{align} P[B_j|A] & = \frac{P[B_j\cap A]}{P[A]} \\ \Rightarrow P[B_j\cap A] &= P[B_j|A]P[A] \end{align}$

Because of the commutativity property of intersection, we can say that

$P[B_j|A]P[A] = P[A|B_j]P[B_j] \$

Dividing both sides by $$ P[A] $$ , we get

$P[B_j|A] = \frac{P[A|B_j]P[B_j]}{P[A]}$

Finally, the denominator can be broken down further using the theorem of total probability so that we have the following expression

$P[B_j|A] = \frac{P[A|B_j]P[B_j]}{\sum_{k=1}^n P[A|B_k]P[B_k]}$

Example Problems

Example 1: Quality Control

Example 2: False Positive Paradox

Example 3: Monty Hall Problem

References

Alberto Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, Third Edition

Questions and comments

If you have any questions, comments, etc. please post them below:

Comment / question 1

Back to Math Squad page

The Spring 2013 Math Squad 2013 was supported by an anonymous gift to Project Rhea. If you enjoyed reading these tutorials, please help Rhea "help students learn" with a donation to this project. Your contribution is greatly appreciated.

@@ Line 1: / Line 1: @@
 [[Category:math]]
 [[Category:tutorial]]
+[[Category:bayes rule]]
+[[Category:conditional probability]]
+[[Category:math squad]]
 == Bayes' Theorem ==
-by Maliha Hossain
+by [[user:Mhossain | Maliha Hossain]], proud Member of [[Math_squad | the Math Squad]].
-<pre> keyword: probability, Bayes' Theorem, Bayes' Rule </pre>
+----
+<pre>keyword: probability, Bayes' Theorem, Bayes' Rule </pre>
 '''INTRODUCTION'''
-Bayes' Theorem (or Bayes' Rule) allows us to calculate P(A|B) from P(B|A) given that   P(A) and P(B) are also known, where A and B are events. In this tutorial, we will derive Bayes' Theorem and illustrate it with a few examples.
+Bayes' Theorem (or Bayes' Rule) allows us to calculate P(A|B) from P(B|A) given that   P(A) and P(B) are also known. In this tutorial, we will derive Bayes' Theorem and illustrate it with a few examples. After going over the examples, if you have any questions or if you find any mistakes please leave me a comment at the end of the relevant section.
-Note that this tutorial assumes familiarity with conditional probability and the axioms of probability.
+Note that this tutorial assumes familiarity with conditional probability and the axioms of probability. If you interested in the derivation of the conditional distributions for continuous and discrete random variables, you may wish to go over Professor Mary Comer's [[ECE600_F13_rv_conditional_distribution_mhossain|notes]] on the subject.
 <pre> Contents
 - Bayes' Theorem
 - Proof
-- Example 1
+- Example Problems
-- Example 2
-- Example 3
 - References
 </pre>
@@ Line 21: / Line 24: @@
 == Bayes' Theorem ==
-Let <math>B_1, B_2, ..., B_n</math> be a partition of the sample space <math>S</math>. Suppose that the even <math>A</math> occurs. Then, by Bayes' Theorem, we have that
+Let <math>B_1, B_2, ..., B_n</math> be a partition of the sample space <math>S</math>, i.e. <math>B_1, B_2, ..., B_n</math> are mutually exclusive events whose union equals the sample space S. Suppose that the event <math>A</math> occurs. Then, by Bayes' Theorem, we have that
-<math>P[B_j|A] = \frac{P[A|B_j]P[B_j]}{P[A]}</math>
+<math>P[B_j|A] = \frac{P[A|B_j]P[B_j]}{P[A]}, j = 1, 2, . . . , n</math>
-<math>\frac{\partial \rho \vec{u}}{\partial t} + \left(\vec{u}\cdot\nabla\right)\rho\vec{u} = -\nabla P + \nabla\cdot(\mu\nabla\vec{u})</math>
+Bayes' Theorem is also often expressed in the following form:
-With the Navier-Stokes equations in terms of partial derivatives in Cartesian coordinates
-<math>\frac{\partial \rho u}{\partial t} + u\frac{\partial u}{\partial x} + v\frac{\partial u}{\partial y} + w\frac{\partial u}{\partial z} = -\frac{\partial P}{\partial x} + \frac{\partial}{\partial x}(\mu \frac{\partial u}{\partial x}) + \frac{\partial}{\partial y}(\mu \frac{\partial u}{\partial y}) + \frac{\partial}{\partial z}(\mu \frac{\partial u}{\partial z})</math>
-<math>\frac{\partial \rho v}{\partial t} + u\frac{\partial v}{\partial x} + v\frac{\partial v}{\partial y} + w\frac{\partial v}{\partial z} = -\frac{\partial P}{\partial y} + \frac{\partial}{\partial x}(\mu \frac{\partial v}{\partial x}) + \frac{\partial}{\partial y}(\mu \frac{\partial v}{\partial y}) + \frac{\partial}{\partial z}(\mu \frac{\partial v}{\partial z})</math>
-<math>\frac{\partial \rho w}{\partial t} + u\frac{\partial w}{\partial x} + v\frac{\partial w}{\partial y} + w\frac{\partial w}{\partial z} = -\frac{\partial P}{\partial z} + \frac{\partial}{\partial x}(\mu \frac{\partial w}{\partial x}) + \frac{\partial}{\partial y}(\mu \frac{\partial w}{\partial y}) + \frac{\partial}{\partial z}(\mu \frac{\partial w}{\partial z})</math>
-The subtle point is that although the latter three equations would appear different if written in cylindrical coordinates (the partial derivatives in <math>x, y, z</math> would be replaced with ones in <math>r,\theta,z</math>), the vector equation does not. However, the implementations of the operators gradient and divergence do depend on the coordinate system.
-In Cartesian coordinates, gradient and divergence are defined as below, where <math>n</math> is the number of spatial dimensions involved. If <math>x_1, x_2, ..., x_n</math> are the coordinate directions and
-<math>\hat{e}_i , i = 1,2,...,n</math>
-are the unit vectors in those directions, then
-<math>\nabla\cdot\vec{v} = \sum_{i=1}^n \frac{\partial v_i}{\partial x_i} \text{,   where  } \vec{v} = \sum_{i=1}^n v_i \hat{e}_i</math>
-<math>\nabla\phi = \sum_{i=1}^n \frac{\partial \phi}{\partial x_i} \hat{e}_i</math>
-Based on this definition, one might expect that in cylindrical coordinates, the gradient operation would be
-<math>\nabla\phi \neq \frac{\partial \phi}{\partial r}\hat{e}_r + \frac{\partial \phi}{\partial \theta}\hat{e}_{\theta} + \frac{\partial \phi}{\partial z}\hat{e}_z</math>
-By simply taking the partial derivatives of <math>\phi</math> with respect to each coordinate direction, multiplying each derivative by the corresponding unit vector, and adding the resulting components together. This is actually not correct for coordinate systems other than Cartesian. One could arrive at the correct formula for the gradient by performing some tedious changes of variables, and repeat the process for the other vector derivatives. However that approach has many opportunities for error and does not produce much insight as to why the coefficients of the partial derivatives are what they are. This tutorial shows a different way to arrive at the same results but with less calculation.
+<math>P[B_j|A] = \frac{P[A|B_j]P[B_j]}{\sum_{k=1}^n P[A|B_k]P[B_k]}</math>
 ----
+== Proof ==
-==Preliminaries==
+We will now derive Bayes' Theorem as it is expressed in the second form, which simply takes the expression one step further than the first.
-This tutorial will denote vector quantities with an arrow atop a letter, except unit vectors that define coordinate systems which will have a hat. 3-D Cartesian coordinates will be indicated by <math>x, y, z</math> and cylindrical coordinates with <math>r,\theta,z</math>.
-[[Image:coordinateSystems.jpg]]
+Let <math>A</math> and <math>B_j</math> be as defined above. By definition of the conditional probability, we have that
-This tutorial will make use of several [[VectorFormulas | vector derivative identities]]. In particular, these:
+<math>P[A|B_j] = \frac{P[A\cap B_j]}{P[B_j]}</math>
-<math>\nabla\cdot(\phi\vec{v}) = \nabla\phi\cdot\vec{v} + \phi \nabla\cdot\vec{u}</math>
+Multiplying both sides with <math>B_j</math>, we get
-On some occasions we will also have to translate between partial derivatives in various coordinate systems. Start with the multivariate chain rule:
+<math>P[A\cap B_j] = P[A|B_j]P[B_j] \ </math>
-<math>\frac{\partial \phi}{\partial r} = \frac{\partial \phi}{\partial x}\frac{\partial x}{\partial r} + \frac{\partial \phi}{\partial y}\frac{\partial y}{\partial r} + \frac{\partial \phi}{\partial z}\frac{\partial z}{\partial r}</math>
+Using the same argument as above, we have that
-<math>\frac{\partial \phi}{\partial \theta} = \frac{\partial \phi}{\partial x}\frac{\partial x}{\partial \theta} + \frac{\partial \phi}{\partial y}\frac{\partial y}{\partial \theta} + \frac{\partial \phi}{\partial z}\frac{\partial z}{\partial \theta}</math>
+<math>
+\begin{align}
+P[B_j|A] & = \frac{P[B_j\cap A]}{P[A]} \\
-<math>\frac{\partial \phi}{\partial z} = \frac{\partial \phi}{\partial x}\frac{\partial x}{\partial z} + \frac{\partial \phi}{\partial y}\frac{\partial y}{\partial z} + \frac{\partial \phi}{\partial z}\frac{\partial z}{\partial z}</math>
+\Rightarrow P[B_j\cap A] &= P[B_j|A]P[A]
+\end{align}
+</math>
-In matrix form:
+Because of the commutativity property of intersection, we can say that
-<math>\begin{bmatrix}
+<math> P[B_j|A]P[A] = P[A|B_j]P[B_j] \ </math>
-\frac{\partial \phi}{\partial r} \\
-\frac{\partial \phi}{\partial \theta} \\
-\frac{\partial \phi}{\partial z} \end{bmatrix} = \begin{bmatrix}
-\frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r} \\
-\frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta} \\
-\frac{\partial x}{\partial z} & \frac{\partial y}{\partial z} & \frac{\partial z}{\partial z} \end{bmatrix} \begin{bmatrix}
-\frac{\partial \phi}{\partial x} \\
-\frac{\partial \phi}{\partial y} \\
-\frac{\partial \phi}{\partial z}\end{bmatrix}</math>
-The entries of the square matrix come from the coordinate transformation itself:
+Dividing both sides by <math>P[A]</math>, we get
-<math>x = r \cos \theta \rightarrow \frac{\partial x}{\partial r} = \cos \theta \text{ , } \frac{\partial x}{\partial \theta} = -r\sin \theta</math>
+<math> P[B_j|A] = \frac{P[A|B_j]P[B_j]}{P[A]}</math>
-<math>y = r \sin \theta \rightarrow \frac{\partial y}{\partial r} = \sin \theta \text{ , } \frac{\partial y}{\partial \theta} = r\cos \theta</math>
+Finally, the denominator can be broken down further using the theorem of total probability so that we have the following expression
-<math>z = z \rightarrow \frac{\partial x}{\partial z} = \frac{\partial y}{\partial z} = 0 \text{ , } \frac{\partial z}{\partial z} = 1</math>
+<math>P[B_j|A] = \frac{P[A|B_j]P[B_j]}{\sum_{k=1}^n P[A|B_k]P[B_k]}</math>
+----
-<math>\begin{bmatrix}
+== Example Problems ==
-\frac{\partial \phi}{\partial r} \\
-\frac{\partial \phi}{\partial \theta} \\
-\frac{\partial \phi}{\partial z}\end{bmatrix} = \begin{bmatrix}
-\cos \theta & \sin \theta & 0 \\
--r \sin\theta & r \cos\theta & 0 \\
-& 0 & 1\end{bmatrix} \begin{bmatrix}
-\frac{\partial \phi}{\partial x} \\
-\frac{\partial \phi}{\partial y} \\
-\frac{\partial \phi}{\partial z}
-\end{bmatrix}</math>
-This gives the partial derivatives with respect to cylindrical coordinate variables in terms of partial derivatives with respect to Cartesian coordinate variables. We can go the other way by inverting this linear system:
+[[bayes_theorem_eg1_S13|Example 1: Quality Control]]
-<math>\begin{bmatrix}
+[[bayes_theorem_eg2_S13|Example 2: False Positive Paradox]]
-\frac{\partial \phi}{\partial x} \\
-\frac{\partial \phi}{\partial y} \\
-\frac{\partial \phi}{\partial z}\end{bmatrix} = \begin{bmatrix}
-\cos \theta & -\frac{\sin \theta}{r} & 0 \\
-\sin \theta & \frac{\cos\theta}{r} & 0 \\
-& 0 & 1\end{bmatrix} \begin{bmatrix}
-\frac{\partial \phi}{\partial r} \\
-\frac{\partial \phi}{\partial \theta} \\
-\frac{\partial \phi}{\partial z}
-\end{bmatrix}</math>
-Note that <math>\phi</math> can be <br>any</br> scalar field for which all partial derivatives exist, including the coordinate variables themselves.
+[[bayes_theorem_eg3_S13|Example 3: Monty Hall Problem]]
+----
-We are now ready to tackle the gradient in cylindrical coordinates.
+== References ==
-== Gradient in Cylindrical Coordinates ==
+* Alberto Leon-Garcia, ''Probability, Statistics, and Random Processes for Electrical Engineering,''  Third Edition
-Obviously, the gradient can be written in terms of the unit vectors of cylindrical and Cartesian coordinate systems as
+----
-<math>a\frac{\partial \phi}{\partial r}\hat{e}_r + b\frac{\partial \phi}{\partial \theta}\hat{e}_{\theta} + c\frac{\partial \phi}{\partial z}\hat{e}_z = \nabla\phi = \frac{\partial \phi}{\partial x}\hat{e}_x + \frac{\partial \phi}{\partial y}\hat{e}_y + \frac{\partial \phi}{\partial z}\hat{e}_z</math>
-Where <math>a,b,c</math> are coefficients to be determined. We can single out components of the left-hand side by taking dot products with the cylindrical unit vectors. This approach yields three equations:
+==Questions and comments==
-<math>a\frac{\partial \phi}{\partial r} = \frac{\partial \phi}{\partial x}\hat{e}_x\cdot\hat{e}_r + \frac{\partial \phi}{\partial y}\hat{e}_y\cdot\hat{e}_r = \frac{\partial \phi}{\partial x}\cos\theta + \frac{\partial \phi}{\partial y}\sin\theta</math>
+If you have any questions, comments, etc. please post them below:
-<math>b\frac{\partial \phi}{\partial \theta} = \frac{\partial \phi}{\partial x}\hat{e}_x\cdot\hat{e}_{\theta} + \frac{\partial \phi}{\partial y}\hat{e}_y\cdot\hat{e}_{\theta} = -\frac{\partial \phi}{\partial x}\sin\theta + \frac{\partial \phi}{\partial y}\cos\theta</math>
+* Comment / question 1
-<math>c\frac{\partial \phi}{\partial z} = \frac{\partial \phi}{\partial z} \rightarrow c = 1</math>
-Solve for <math>a,b</math> by substituting into the first two of these equations the first two rows of the change-of-variable matrix:
-<math>\frac{\partial \phi}{\partial x}\cos\theta + \frac{\partial \phi}{\partial y}\sin\theta = a\left(\frac{\partial \phi}{\partial x} \cos\theta + \frac{\partial \phi}{\partial y} \sin\theta \right) \rightarrow a = 1</math>
-<math>-\frac{\partial \phi}{\partial x}\sin\theta + \frac{\partial \phi}{\partial y}\cos\theta = b\left(-\frac{\partial \phi}{\partial x} r\sin\theta + \frac{\partial \phi}{\partial y} r\cos\theta\right) \rightarrow b = \frac{1}{r} </math>
-So the gradient expression we sought turns out to be
-<math>\nabla\phi = \frac{\partial \phi}{\partial r}\hat{e}_r + \frac{1}{r}\frac{\partial \phi}{\partial \theta}\hat{e}_{\theta} + \frac{\partial \phi}{\partial z}\hat{e}_z</math>
-==Divergence in Cylindrical Coordinates==
-We want an expression for
-<math>\nabla\cdot\vec{u} = \nabla\cdot\left(u_r\hat{e}_r + u_{\theta}\hat{e}_{\theta} + u_z\hat{e}_z\right)</math>
-That involves only derivatives in cylindrical coordinates. Using the vector identity mentioned in the preliminaries, this equation can be expanded as:
-<math>\nabla\cdot\vec{u} = \left(\nabla u_r\right)\cdot\hat{e}_r + u_r\left(\nabla\cdot\hat{e}_r\right)  + \left(\nabla u_{\theta}\right)\cdot\hat{e}_{\theta} + u_{\theta}\left(\nabla\cdot\hat{e}_{\theta}\right) + \left(\nabla u_z\right)\cdot\hat{e}_z + u_z\left(\nabla\cdot\hat{e}_z\right)</math>
-The terms involving gradients of the components of the vector field simplify to the partial derivatives of components with respect to their corresponding directions, multiplied by the coefficients found in the previous section:
-<math>\nabla\cdot\vec{u} = \frac{\partial u_r}{\partial r} + \frac{1}{r}\frac{\partial u_{\theta}}{\partial \theta} + \frac{\partial u_z}{\partial z} + u_r\left(\nabla\cdot\hat{e}_r\right) + u_{\theta}\left(\nabla\cdot\hat{e}_{\theta}\right) + u_z\left(\nabla\cdot\hat{e}_z\right)</math>
-So a divergence "correction" must be applied, which arises from the divergence of the unit vector fields. Technically the unit "vectors" referred to in this tutorial are actually vector fields, since the unit vectors of a coordinate system are defined at all points in space (other than zero).
-[[Image:unitVectorFields.jpg]]
-So we're interested now in the divergences these fields in order to complete the previous equation.
-==References==
 ----
-<math>\int_0^1\int_0^4\int_{-1}^7\nabla\phi {dV} = \frac{\partial u}{\partial x}\hat{e}_x</math>
+[[Math_squad|Back to Math Squad page]]
-<math>\iiint_{\partial \Omega} {\mathbb R}</math>
-[http://www.google.com Here's Google]
+<div style="font-family: Verdana, sans-serif; font-size: 14px; text-align: justify; width: 70%; margin: auto; border: 1px solid #aaa; padding: 2em;">
+The Spring 2013 Math Squad 2013 was supported by an anonymous [https://www.projectrhea.org/learning/donate.php gift] to [https://www.projectrhea.org/learning/about_Rhea.php Project Rhea]. If you enjoyed reading these tutorials, please help Rhea "help students learn" with a [https://www.projectrhea.org/learning/donate.php donation] to this project. Your [https://www.projectrhea.org/learning/donate.php contribution] is greatly appreciated.
-----
+</div>
-[[Math_squad|Back to Math Squad page]]

Difference between revisions of "Bayes theorem S13" - Rhea

Latest revision as of 13:08, 25 November 2013

Contents

Bayes' Theorem

Bayes' Theorem

Proof

Example Problems

References

Questions and comments

Alumni Liaison