(Translating equations...)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
`BPE - Bayesian Parameter Estimation from Lecture 7 <https://engineering.purdue.edu/people/mireille.boutin.1/ECE301kiwi/Lecture7>`_
+
=BPE FOR MULTIVARIATE GAUSSIAN=
 +
for [[ECE662:BoutinSpring08_Old_Kiwi|ECE662: Decision Theory]]
  
BPE FOR MULTIVARIATE GAUSSIAN :
+
Complement to  [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
- Estimation of mean, given a known covariance
+
----
 +
== Estimation of mean, given a known covariance ==
 
Consider a set of iid samples <math>\{X_i\}_{i=1}^N</math> where <math>X_i \in\mathbb{R}^n</math> is such that <math>X_i \sim N(\mu,\Sigma)</math>.  Suppose we know <math>\Sigma</math>, but wish to estimate <math>\mu</math> using BPE.  If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. <math>p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)</math>, where <math>\mu_N</math> and <math>\Sigma_N</math> are calculated to utilize both our prior knowledge of <math>\mu</math> and the samples <math>\{X_i\}_{i=1}^N</math>.  Fukunaga p. 391 derives that the parameters <math>\mu_N</math> and <math>\Sigma_N</math> are calculated as follows:
 
Consider a set of iid samples <math>\{X_i\}_{i=1}^N</math> where <math>X_i \in\mathbb{R}^n</math> is such that <math>X_i \sim N(\mu,\Sigma)</math>.  Suppose we know <math>\Sigma</math>, but wish to estimate <math>\mu</math> using BPE.  If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. <math>p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)</math>, where <math>\mu_N</math> and <math>\Sigma_N</math> are calculated to utilize both our prior knowledge of <math>\mu</math> and the samples <math>\{X_i\}_{i=1}^N</math>.  Fukunaga p. 391 derives that the parameters <math>\mu_N</math> and <math>\Sigma_N</math> are calculated as follows:
  
|mu_Ndef|,
+
<math>\mu_N = \frac{\Sigma}{N}(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right)</math>,
  
where |mu_0| is the initial "geuss" for the mean |mu|, and |Sig_mu| is the "confidence" in that guess.  In other words, we can consider that |mu_prior| is the prior distribution for |mu| that we would assume without seeing any samples.  For the covariance parameter, we have
+
where <math>\mu_0</math> is the initial "guess" for the mean <math>\mu</math>, and <math>\Sigma_\mu</math> is the "confidence" in that guess.  In other words, we can consider that <math>N(\mu_0,\Sigma_\mu)</math> is the prior distribution for <math>\mu</math> that we would assume without seeing any samples.  For the covariance parameter, we have
  
|Sig_Ndef|.
+
<math>\Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N}</math>.
  
We find that as the number of samples increases, that the effect of the prior knowledge (|mu_0|,|Sig_mu|) decreases so that
+
We find that as the number of samples increases, that the effect of the prior knowledge (<math>\mu_0</math>,<math>\Sigma_\mu</math>) decreases so that
  
|mu_Nlimit|, and |Sig_Nlimit|.
+
<math>\lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i</math>, and <math>\lim_{N\rightarrow\infty}\Sigma_N = 0</math>.
  
- Estimation of covariance, given a known mean
+
== Estimation of covariance, given a known mean ==
Again, given iid samples |Xset|, |XinRn|, |Xdist|, let us now estimate |Sig| with |mu| known.  As in Fukinaga p. 392, we assume that both the posterior distribution of |Sig| is normal (i.e. |Sig_posterior|), and it can be shown that the sample covariance matrix follows a Wishart Distribution.  Fukinaga p.392 shows the distribution |pK|, where |Kdef|, and parameter |Sig_0| represents the initial "guess" for |Sig| and |N_0| represents "how many samples were used to compute |Sig_0|".  Note that we compute the distribution for |Kdef| instead of |Sig| directly, since the inverse covariance matrix is used in the definition for a normal distribution.  It can be shown, then, that
+
Again, given iid samples <math>\{X_i\}_{i=1}^N</math>, <math>X_i \in\mathbb{R}^n</math>, <math>X_i \sim N(\mu,\Sigma)</math>, let us now estimate <math>\Sigma</math> with <math>\mu</math> known.  As in Fukunaga p. 392, we assume that both the posterior distribution of <math>\Sigma</math> is normal (i.e. <math>p(X|\Sigma) = N(\mu,\Sigma)</math>), and it can be shown that the sample covariance matrix follows a Wishart Distribution.  Fukunaga p.392 shows the distribution <math>p(K|\Sigma_0,N_0)</math>, where <math>K = \Sigma^{-1}</math>, and parameter <math>\Sigma_0</math> represents the initial "guess" for <math>\Sigma</math> and <math>N_0</math> represents "how many samples were used to compute <math>\Sigma_0</math>".  Note that we compute the distribution for <math>K = \Sigma^{-1}</math> instead of <math>\Sigma</math> directly, since the inverse covariance matrix is used in the definition for a normal distribution.  It can be shown, then, that
  
|pKdef|,
+
<math>p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K))</math>,
  
where |cnN0def|.
+
where <math>c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1}</math>.
  
.. |Xset| image:: tex
+
== Simultaneous estimation of unknown mean and covariance ==
:alt: tex: \{X_i\}_{i=1}^N
+
Finally, given iid samples <math>\{X_i\}_{i=1}^N</math>, <math>X_i \in\mathbb{R}^n</math>, <math>X_i \sim N(\mu,\Sigma)</math>, we now wish to estimate both <math>\mu</math> and <math>\Sigma</math> (or <math>K = \Sigma^{-1}</math>). Fukunaga p. 393 gives that the joint distribution follows the Gauss-Wishart distribution as follows
.. |XinRn| image:: tex
+
 
:alt: tex: X_i \in\mathbb{R}^n
+
<math>p(\mu,K|\mu_0,\Sigma_0,\mu_{\Sigma},N_0) = (2\pi)^{-n/2}|\mu_{\Sigma} K|^{1/2}\exp\left(-\frac12\mu_{\Sigma}(\mu-\mu_0)^TK(\mu-\mu_0) \right)\times c(n,N_0)|\frac12N_0\Sigma_0|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp\left(-\frac12\mathrm{trace}(N_0\Sigma_0K\right)</math>,
.. |Xdist| image:: tex
+
where <math>\mu_0</math>, <math>\Sigma_0</math>, <math>N_0</math>, and <math>c(n,N_0)</math> are as above.
:alt: tex: X_i \sim N(\mu,\Sigma)
+
----
.. |Sig| image:: tex
+
Back to  [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
:alt: tex: \Sigma
+
.. |Sig_0| image:: tex
+
:alt: tex: \Sigma_0
+
.. |N_0| image:: tex
+
:alt: tex: N_0
+
.. |mu| image:: tex
+
:alt: tex: \mu
+
.. |Sig_N| image:: tex
+
:alt: tex: \Sigma_N
+
.. |mu_N| image:: tex
+
:alt: tex: \mu_N
+
.. |mu_0| image:: tex
+
:alt: tex: \mu_0
+
.. |Sig_mu| image:: tex
+
:alt: tex: \Sigma_\mu
+
.. |mu_posterior| image:: tex
+
:alt: tex: p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)
+
.. |mu_Ndef| image:: tex
+
:alt: tex: \mu_N = \frac{\Sigma}{N}(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right)
+
.. |mu_prior| image:: tex
+
:alt: tex: N(\mu_0,\Sigma_\mu)
+
.. |Sig_Ndef| image:: tex
+
:alt: tex: \Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N}
+
.. |mu_Nlimit| image:: tex
+
:alt: tex: \lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i
+
.. |Sig_Nlimit| image:: tex
+
:alt: tex: \lim_{N\rightarrow\infty}\Sigma_N = 0
+
.. |Sig_posterior| image:: tex
+
:alt: tex: p(X|\Sigma) = N(\mu,\Sigma)
+
.. |pK| image:: tex
+
:alt: tex: p(K|\Sigma_0,N_0)
+
.. |Kdef| image:: tex
+
:alt: tex: K = \Sigma^{-1}
+
.. |pKdef| image:: tex
+
:alt: tex:  p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K))
+
.. |cnN0def| image:: tex
+
:alt: tex: c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1}
+

Latest revision as of 10:37, 20 May 2013

BPE FOR MULTIVARIATE GAUSSIAN

for ECE662: Decision Theory

Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin


Estimation of mean, given a known covariance

Consider a set of iid samples $ \{X_i\}_{i=1}^N $ where $ X_i \in\mathbb{R}^n $ is such that $ X_i \sim N(\mu,\Sigma) $. Suppose we know $ \Sigma $, but wish to estimate $ \mu $ using BPE. If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. $ p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N) $, where $ \mu_N $ and $ \Sigma_N $ are calculated to utilize both our prior knowledge of $ \mu $ and the samples $ \{X_i\}_{i=1}^N $. Fukunaga p. 391 derives that the parameters $ \mu_N $ and $ \Sigma_N $ are calculated as follows:

$ \mu_N = \frac{\Sigma}{N}(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right) $,

where $ \mu_0 $ is the initial "guess" for the mean $ \mu $, and $ \Sigma_\mu $ is the "confidence" in that guess. In other words, we can consider that $ N(\mu_0,\Sigma_\mu) $ is the prior distribution for $ \mu $ that we would assume without seeing any samples. For the covariance parameter, we have

$ \Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N} $.

We find that as the number of samples increases, that the effect of the prior knowledge ($ \mu_0 $,$ \Sigma_\mu $) decreases so that

$ \lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i $, and $ \lim_{N\rightarrow\infty}\Sigma_N = 0 $.

Estimation of covariance, given a known mean

Again, given iid samples $ \{X_i\}_{i=1}^N $, $ X_i \in\mathbb{R}^n $, $ X_i \sim N(\mu,\Sigma) $, let us now estimate $ \Sigma $ with $ \mu $ known. As in Fukunaga p. 392, we assume that both the posterior distribution of $ \Sigma $ is normal (i.e. $ p(X|\Sigma) = N(\mu,\Sigma) $), and it can be shown that the sample covariance matrix follows a Wishart Distribution. Fukunaga p.392 shows the distribution $ p(K|\Sigma_0,N_0) $, where $ K = \Sigma^{-1} $, and parameter $ \Sigma_0 $ represents the initial "guess" for $ \Sigma $ and $ N_0 $ represents "how many samples were used to compute $ \Sigma_0 $". Note that we compute the distribution for $ K = \Sigma^{-1} $ instead of $ \Sigma $ directly, since the inverse covariance matrix is used in the definition for a normal distribution. It can be shown, then, that

$ p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K)) $,

where $ c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1} $.

Simultaneous estimation of unknown mean and covariance

Finally, given iid samples $ \{X_i\}_{i=1}^N $, $ X_i \in\mathbb{R}^n $, $ X_i \sim N(\mu,\Sigma) $, we now wish to estimate both $ \mu $ and $ \Sigma $ (or $ K = \Sigma^{-1} $). Fukunaga p. 393 gives that the joint distribution follows the Gauss-Wishart distribution as follows

$ p(\mu,K|\mu_0,\Sigma_0,\mu_{\Sigma},N_0) = (2\pi)^{-n/2}|\mu_{\Sigma} K|^{1/2}\exp\left(-\frac12\mu_{\Sigma}(\mu-\mu_0)^TK(\mu-\mu_0) \right)\times c(n,N_0)|\frac12N_0\Sigma_0|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp\left(-\frac12\mathrm{trace}(N_0\Sigma_0K\right) $, where $ \mu_0 $, $ \Sigma_0 $, $ N_0 $, and $ c(n,N_0) $ are as above.


Back to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin

Alumni Liaison

Abstract algebra continues the conceptual developments of linear algebra, on an even grander scale.

Dr. Paul Garrett