Revision as of 15:59, 26 January 2010 by Jvaught (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.


Consider a random experiment with 2 outcomes. Let $ E_0 $ which occurs with fixed but arbitrary probability $ p $ be the event that outcome 0 occurs and $ E_1 $ which occurs with probability $ 1-p $ be the event that outcome 1 occurs. Assume without loss of generality that $ p \ge \frac{1}{2} $ (relabelling the outcomes if necessary).


Consider a joint, independent random experiment intended to predict the outcome of the first. Let $ F_0 $ be the event that outcome 0 is predicted and $ F_1 $ be the event that outcome 1 is predicted. Let $ q $ be the probability of $ F_0 $.


Then the probability of error $ P_{err} = Pr(\{(E_0, F_1), (E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\}) $. By independence $ Pr(\{(E_0, F_1)\}) = Pr(\{E_0\}) \cdot Pr(\{F_1\}) $ and $ Pr(\{(E_1, F_0)\}) = Pr(\{E_1\}) \cdot Pr(\{F_0\}) $. So $ P_{err} = p(1-q) + (1-p)q = p - 2pq + q = (1-2p)q + p $.


We would like to choose $ q\in[0,1] $ to minimize $ P_{err} $. Since $ P_{err} $ is linear in $ q $, the extrema are at the endpoints. Hence, evaluating at $ q=0 $ and $ q=1 $, the minimal $ P_{err} $ is $ 1-p $ at $ q=1 $. Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.


Futhermore, on the interval $ p\in[\frac{1}{2}, 1] $, $ P_{err} $ is a strictly decreasing function. That is, the closer $ p $ is to $ \frac{1}{2} $, the worse it can be predicted (the higher $ P_{err} $ is), and the farther $ p $ is from $ \frac{1}{2} $ the better it can be predicted. This is consistent with the information theoretic description of entropy (which has its maximum at $ p=\frac{1}{2} $) as the "average uncertainty in the outcome". Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.


As a concrete example, consider two approaches for predicting an experiment with $ p=.8 $ (i.e. $ E_0 $ occurs with probability .8 and $ E_1 $ occurs with probability .2). In the first approach we always predict $ E_0 $ (hence $ q = Pr(F_0) = 1, Pr(F_1) = 0 $). With this approach we have $ Pr(\{(E_0, F_0)\}) = .8 $, $ Pr(\{(E_0, F_1)\}) = 0 $, $ Pr(\{(E_1, F_0)\}) = .2 $, $ Pr(\{(E_1, F_1)\}) = 0 $. So $ P_{err} = .2 $.

In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2). With this approach we have $ Pr(\{(E_0, F_0)\}) = .64 $, $ Pr(\{(E_0, F_1)\}) = .16 $, $ Pr(\{(E_1, F_0)\}) = .16 $, $ Pr(\{(E_1, F_1)\}) = .04 $. So $ P_{err} = .32 $, substantially worse.

--Jvaught 19:59, 26 January 2010 (UTC)

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett