Difference between revisions of "EE662Sp10OptimalPrediction" - Rhea

Revision as of 10:28, 27 January 2010

I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.

Consider a random experiment with 2 outcomes, 0 and 1.

Let $$ E_0 $$ be the event that outcome 0 occurs and $$ E_1 $$ be the event that outcome 1 occurs.

Let $$ Pr(E_0) = p $$ and $$ Pr(E_1) = 1-p $$ , where $$ p $$ is some fixed but arbitrary probability.

Assume, without loss of generality, that $p \ge 0.5$ (relabelling the outcomes if necessary).

Consider a joint, independent random experiment intended to predict the outcome of the first.

Let $$ F_0 $$ be the event that outcome 0 is predicted and $$ F_1 $$ be the event that outcome 1 is predicted.

Let $$ Pr(F_0) = q $$ and $$ Pr(F_1) = 1-q $$

The probability of error is $P_{err} = Pr(\{(E_0, F_1),(E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\})$ .

By independence, $Pr(\{(E_0, F_1)\}) = Pr(\{E_0\}) Pr(\{F_1\})$ and $Pr(\{(E_1, F_0)\}) = Pr(\{E_1\}) Pr(\{F_0\})$ .

So, $P_{err} = p(1-q) + (1-p)q = p - 2pq + q = (1-2p)q + p$ .

We would like to choose $q\in[0,1]$ to minimize $P_{err}$ . Since $P_{err}$ is linear in $$ q $$ , the extrema are at the endpoints. Hence, evaluating at $$ q=0 $$ and $$ q=1 $$ , the minimal $P_{err}$ is $$ 1-p $$ at $$ q=1 $$ . Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.

Futhermore, on the interval $p\in[\frac{1}{2}, 1]$ , $P_{err}$ is a strictly decreasing function. That is, the closer $$ p $$ is to $\frac{1}{2}$ , the worse it can be predicted (the higher $P_{err}$ is), and the farther $$ p $$ is from $\frac{1}{2}$ the better it can be predicted. This is consistent with the information theoretic description of entropy (which has its maximum at $p=\frac{1}{2}$ ) as the "average uncertainty in the outcome". Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.

As a concrete example, consider two approaches for predicting an experiment with $$ p=.8 $$ (i.e. $$ E_0 $$ occurs with probability .8 and $$ E_1 $$ occurs with probability .2). In the first approach we always predict $$ E_0 $$ (hence $$ q = Pr(F_0) = 1, Pr(F_1) = 0 $$ ). With this approach we have $Pr(\{(E_0, F_0)\}) = .8$ , $Pr(\{(E_0, F_1)\}) = 0$ , $Pr(\{(E_1, F_0)\}) = .2$ , $Pr(\{(E_1, F_1)\}) = 0$ . So $P_{err} = .2$ .

In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2). With this approach we have $Pr(\{(E_0, F_0)\}) = .64$ , $Pr(\{(E_0, F_1)\}) = .16$ , $Pr(\{(E_1, F_0)\}) = .16$ , $Pr(\{(E_1, F_1)\}) = .04$ . So $P_{err} = .32$ , substantially worse.

--Jvaught 19:59, 26 January 2010 (UTC)

@@ Line 3: / Line 3: @@
 so I worked it out as follows.
+Consider a random experiment with 2 outcomes, 0 and 1.
-Consider a random experiment with 2 outcomes.  Let <math>E_0</math>
+Let <math>E_0</math> be the event that outcome 0 occurs and <math>E_1</math> be the event that outcome 1 occurs.
-which occurs with fixed but arbitrary probability <math>p</math> be
-the event that outcome 0 occurs and <math>E_1</math> which occurs
-with probability <math>1-p</math> be the event that outcome 1 occurs.
-Assume without loss of generality that <math>p \ge \frac{1}{2}</math>
-(relabelling the outcomes if necessary).
+Let <math>Pr(E_0) = p</math> and <math>Pr(E_1) = 1-p</math>, where <math>p</math> is some fixed but arbitrary probability.
-Consider a joint, independent random experiment intended to
+Assume, without loss of generality, that <math>p \ge 0.5</math> (relabelling the outcomes if necessary).
-predict the outcome of the first.  Let <math>F_0</math> be the
-event that outcome 0 is predicted and <math>F_1</math> be the
-event that outcome 1 is predicted.  Let <math>q</math> be the
-probability of <math>F_0</math>.
-Then the probability of error <math>P_{err} = Pr(\{(E_0, F_1),
+Consider a joint, independent random experiment intended to predict the outcome of the first.
-(E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\})</math>.
-By independence <math>Pr(\{(E_0, F_1)\}) = Pr(\{E_0\}) \cdot
+Let <math>F_0</math> be the event that outcome 0 is predicted and <math>F_1</math> be the event that outcome 1 is predicted.
-Pr(\{F_1\})</math> and <math>Pr(\{(E_1, F_0)\}) = Pr(\{E_1\})
-\cdot Pr(\{F_0\})</math>.  So <math>P_{err} = p(1-q) + (1-p)q
+Let <math>Pr(F_0) = q</math> and <math>Pr(F_1) = 1-q</math>
-= p - 2pq + q = (1-2p)q + p</math>.
+The probability of error is
+<math>P_{err} = Pr(\{(E_0, F_1),(E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\})</math>.
+By independence,
+<math>Pr(\{(E_0, F_1)\}) = Pr(\{E_0\})  Pr(\{F_1\})</math>
+and
+<math>Pr(\{(E_1, F_0)\}) = Pr(\{E_1\})  Pr(\{F_0\})</math>.
+So,
+<math>P_{err} = p(1-q) + (1-p)q = p - 2pq + q = (1-2p)q + p</math>.

Difference between revisions of "EE662Sp10OptimalPrediction" - Rhea

Revision as of 10:28, 27 January 2010

Alumni Liaison