EE662Sp10OptimalPrediction - Rhea

I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.

Consider a random experiment with 2 outcomes, 0 and 1.

Let $$ E_0 $$ be the event that outcome 0 occurs and $$ E_1 $$ be the event that outcome 1 occurs.

Let $$ Pr(E_0) = p $$ and $$ Pr(E_1) = 1-p $$ , where $$ p $$ is some fixed but arbitrary probability.

Assume, without loss of generality, that $p \ge 0.5$ (relabelling the outcomes if necessary).

Consider a joint, independent random experiment intended to predict the outcome of the first.

Let $$ F_0 $$ be the event that outcome 0 is predicted and $$ F_1 $$ be the event that outcome 1 is predicted.

Let $$ Pr(F_0) = q $$ and $$ Pr(F_1) = 1-q $$

The probability of error is $P_{err} = Pr(\{(E_0, F_1),(E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\})$ .

By independence, $Pr(\{(E_0, F_1)\}) = Pr(\{E_0\}) Pr(\{F_1\})$ and $Pr(\{(E_1, F_0)\}) = Pr(\{E_1\}) Pr(\{F_0\})$ .

So, $P_{err} = p(1-q) + (1-p)q = p - 2pq + q = (1-2p)q + p$ .

We would like to choose $q\in[0,1]$ to minimize $P_{err}$ . Since $P_{err}$ is linear in $$ q $$ , the extrema are at the endpoints. Hence, evaluating at $$ q=0 $$ and $$ q=1 $$ , the minimal $P_{err}$ is $$ 1-p $$ at $$ q=1 $$ . Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.

Futhermore, on the interval $p\in[\frac{1}{2}, 1]$ , $P_{err}$ is a strictly decreasing function. That is, the closer $$ p $$ is to $\frac{1}{2}$ , the worse it can be predicted (the higher $P_{err}$ is), and the farther $$ p $$ is from $\frac{1}{2}$ the better it can be predicted. This is consistent with the information theoretic description of entropy (which has its maximum at $p=\frac{1}{2}$ ) as the "average uncertainty in the outcome". Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.

As a concrete example, consider two approaches for predicting an experiment with $$ p=.8 $$ (i.e. $$ E_0 $$ occurs with probability .8 and $$ E_1 $$ occurs with probability .2). In the first approach we always predict $$ E_0 $$ (hence $$ q = Pr(F_0) = 1, Pr(F_1) = 0 $$ ). With this approach we have $Pr(\{(E_0, F_0)\}) = .8$ , $Pr(\{(E_0, F_1)\}) = 0$ , $Pr(\{(E_1, F_0)\}) = .2$ , $Pr(\{(E_1, F_1)\}) = 0$ . So $P_{err} = .2$ .

In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2). With this approach we have $Pr(\{(E_0, F_0)\}) = .64$ , $Pr(\{(E_0, F_1)\}) = .16$ , $Pr(\{(E_1, F_0)\}) = .16$ , $Pr(\{(E_1, F_1)\}) = .04$ . So $P_{err} = .32$ , substantially worse.

--Jvaught 19:59, 26 January 2010 (UTC)

EE662Sp10OptimalPrediction - Rhea

Alumni Liaison