Bayes Parameter Estimation (BPE) tutorial

A slecture by ECE student Haiguang Wen

Partially based on the ECE662 lecture material of Prof. Mireille Boutin.

What will you learn from this slecture?

Basic knowledge of Bayes parameter estimation
An example to illustrate the concept and properties of BPE
The effect of sample size on the posterior
The effect of prior on the posterior

Introduction

Bayes parameter estimation (BPE) is a widely used technique for estimating the probability density function of random variables with unknown parameters. Suppose that we have an observable random variable X for an experiment and its distribution depends on unknown parameter θ taking values in a parameter space Θ. The probability density function of X for a given value of θ is denoted by p(x|θ ). It should be noted that the random variable X and the parameter θ can be vector-valued. Now we obtain a set of independent observations or samples S = {x1,x2,...,xn} from an experiment. Our goal is to compute p(x|S) which is as close as we can come to obtain the unknown p(x), the probability density function of X.

In Bayes parameter estimation, the parameter θ is viewed as a random variable or random vector following the distribution p(θ ). Then the probability density function of X given a set of observations S can be estimated by

$p(x|S) = \int p(x,\theta |S) d\theta$

$=\int p(x|\theta,S)p(\theta|S)d\theta$ (1)

$= \int p(x|\theta)p(\theta|S)d\theta$

So if we know the form of p(x|θ) with unknown parameter vector θ, then we need to estimate the weight p(θ |S), often called posterior, so as to obtain p(x|S) using Eq. (1). Based on Bayes Theorem, the posterior can be written as
$p(\theta|S) = \frac{p(S|\theta)p(\theta)}{\int p(S|\theta)p(\theta)d\theta}$

where p(θ) is called prior distribution or simply prior, and p(S|θ) is called likelihood function [1]. A prior is intended to reflect our knowledge of the parameter before we gather data and the posterior is an updated distribution after obtaining the information from data.

Estimate posterior

In this section, let’s start with a tossing coin example [2]. Let S = {x1,x2,...,xn} be a set of coin flip- ping observations, where xi = 1 denotes ’Head’ and xi = 0 denotes ’tail’. Assume the coin is weighted and our goal is to estimate parameter θ , the probability of ’Head’. Assume that we flipped a coin 20 times yesterday, but we did not remember how many times the ’Head’ was observed. What we know is that the probability of ’Head’ is around 1/4, but this probability is uncertain since we only did 20 trails and we did not remember the number of ’Heads’. With this prior information, we decide to do this experiment today so as to estimate the parameter θ .

A prior represents our previous knowledge or belief about parameter θ. Based on our memories
from yesterday, assume that the prior of θ follows Beta distribution Beta(5, 15) (Figure ??).

$\text{Beta}(5,15) = \frac{\theta^{4}(1-\theta)^{14}}{\int\theta^{4}(1-\theta)^{14}d\theta}$

Today we flipped the same coin $n$ times and $y$ 'Heads' were observed. Then we compute the posterior with today's data. Consider Eq. (\ref{eq:posterior}), the posterior is written as

$p(\theta|S) = \frac{p(S|\theta)p(\theta)}{\int p(S|\theta)p(\theta)d\theta}$

$= \text{const}\times \theta^{y}(1-\theta)^{n-y}\theta^4(1-\theta)^{14}$

$=\text{const}\times \theta^{y+4}(1-\theta)^{n-y+14}$

$=\text{Beta}(y+5, \;n-y+15)$

Assume that we did 500 trials and 'Head' appeared 240 times, the posterior is $\text{Beta}(245,275)$ (Figure \ref{fig:posterior}). It can be noted that the posterior and prior distribution have the same form. This kind of prior distribution is called \textit{conjugate prior}. The Beta distribution is conjugate to the binomial distribution which gives the likelihood of i.i.d Bernoulli trials.

As we can see, the conjugate prior successfully includes previous information, or our belief of parameter $\theta$ into the posterior. So our knowledge about the parameter is updated with today's data, and the posterior obtained today can be used as prior for tomorrow's estimation. This reveals an important property of Bayes parameter estimation, that the Bayes estimator is based on cumulative information or knowledge of unknown parameters, from past and present.

Estimate density function of random variable

After we obtain the posterior, then we can estimate the probability density function of random variable $X$. Consider Eq. (\ref{eq:BPE}), the density function can be expressed as

$p(x|S) = \int p(x|\theta)p(\theta|S)d\theta$

$=\text{const}\int \theta^x(1-\theta)^{1-x}\theta^{y+4}(1-\theta)^{n-y+14}d\theta$

$= \left\lbrace$

Failed to parse (unknown function\nonumber): = \left\lbrace \nonumber \begin{array}{ll} \frac{y+5}{n+20} & x=1\\ \\ \frac{n-y+15}{n+20} & x=0 \end{array} \right.

Bayes Parameter Estimation - Rhea

Contents

Bayes Parameter Estimation (BPE) tutorial

What will you learn from this slecture?

Introduction

Estimate posterior

Estimate density function of random variable

Alumni Liaison