EE662Sp10OptimalPrediction - Rhea

I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.

Consider a random experiment with 2 outcomes, 0 and 1.

Let $\displaystyle E_0$ be the event that outcome 0 occurs and $\displaystyle E_1$ be the event that outcome 1 occurs.

Let $\displaystyle Pr(E_0) = p$ and $\displaystyle Pr(E_1) = 1-p$ , where $\displaystyle p$ is some fixed but arbitrary probability.

Assume, without loss of generality, that $\displaystyle p \ge \frac{1}{2}$ (relabelling the outcomes if necessary).

Consider a joint, independent random experiment intended to predict the outcome of the first.

Let $\displaystyle F_0$ be the event that outcome 0 is predicted and $\displaystyle F_1$ be the event that outcome 1 is predicted.

Let $\displaystyle Pr(F_0) = q$ and $\displaystyle Pr(F_1) = 1-q$

The probability of error is

$\displaystyle P_{err} = Pr((E_0 \cap F_1) \cup (E_1 \cap F_0)) = Pr(E_0 \cap F_1) + Pr(E_1 \cap F_0)$ .

By independence,

$\displaystyle Pr(E_0 \cap F_1) = Pr(E_0) \cdot Pr(F_1)$

$\displaystyle Pr(E_1 \cap F_0) = Pr(E_1) \cdot Pr(F_0)$ .

So,

$\displaystyle P_{err} = Pr(E_0) \cdot Pr(F_1) + Pr(E_1) \cdot Pr(F_0) = p(1-q) + (1-p)q$ .

We would like to choose $\displaystyle q\in[0,1]$ to minimize $\displaystyle P_{err}$ . Since $\displaystyle P_{err}$ is linear in $\displaystyle q$ , the extrema are at the endpoints. Hence, evaluating at $\displaystyle q=0$ and $\displaystyle q=1$ , the minimal $\displaystyle P_{err}$ is $\displaystyle 1-p$ at $\displaystyle q=1$ . Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.

Futhermore, on the interval $\displaystyle p\in[\frac{1}{2}, 1]$ , $\displaystyle P_{err}$ is a strictly decreasing function. That is, the closer $\displaystyle p$ is to $\displaystyle \frac{1}{2}$ , the worse it can be predicted (the higher $\displaystyle P_{err}$ is), and the farther $\displaystyle p$ is from $\displaystyle \frac{1}{2}$ the better it can be predicted. This is consistent with the information theoretic description of entropy (which has its maximum at $\displaystyle p=\frac{1}{2}$ ) as the "average uncertainty in the outcome". Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.

As a concrete example, consider two approaches for predicting an experiment with $\displaystyle p=.8$ (i.e. $\displaystyle E_0$ occurs with probability .8 and $\displaystyle E_1$ occurs with probability .2). In the first approach we always predict $\displaystyle E_0$ (hence $\displaystyle q = Pr(F_0) = 1, Pr(F_1) = 0$ ). With this approach we have $\displaystyle Pr(\{(E_0, F_0)\}) = .8$ , $\displaystyle Pr(\{(E_0, F_1)\}) = 0$ , $\displaystyle Pr(\{(E_1, F_0)\}) = .2$ , $\displaystyle Pr(\{(E_1, F_1)\}) = 0$ . So $\displaystyle P_{err} = .2$ .

In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2). With this approach we have $\displaystyle Pr(\{(E_0, F_0)\}) = .64$ , $\displaystyle Pr(\{(E_0, F_1)\}) = .16$ , $\displaystyle Pr(\{(E_1, F_0)\}) = .16$ , $\displaystyle Pr(\{(E_1, F_1)\}) = .04$ . So $\displaystyle P_{err} = .32$ , substantially worse.

--Jvaught 19:59, 26 January 2010 (UTC)

EE662Sp10OptimalPrediction - Rhea

Alumni Liaison