Revision as of 14:34, 1 March 2013

Bayes Decision Theory - Continuous Features

Continuing from the last essay, we will now improve on the model in the following ways:

Allowing the use of more than one feature - Like adding the shape of the cards as another feature.
Allowing more than two states of nature - Having a deck also containing clubs and hearts.
Allowing actions more than just deciding more than the state of nature of the cards.
Introducing a loss function.

Allowing the use of more than one feature just means that we would replace the scaler y with the feature vector Y, where Y is in a d-dimensional Euclidean space $$ R^d $$ , called the feature space. Allowing more than two states of nature provides a useful generalization with small notational expense. Allowing more actions also opens up the possibility of rejection i.e refusing to make a decision in too close cases. This is can be very useful if being indecisive is not too costly. The Loss function states exactly how costly each action chosen is, and is used to convert a probability determination into a decision. Cost functions enables us to look at situations where certain errors are more costly than others, although we will often only be looking at cases where all errors are equally costly.

Putting this together, let {x₁,...,x_c} be the finite set of c states of nature and let {k₁,...,k_a} be the finite set of a possible actions. The loss function λ(k_i|x_j) describes the loss incurred for taking action k_i when the state of nature is x_j. Let Y be a d-component-vector-valued-RV, and let p(Y|x_j) be the conditional probability density function for Y with x_j being the true state of nature. As discussed before, P(x_j) is the prior probability that nature is in state x_j, therefore by using Bayes formula we can find the posterior probability P(x_j|Y):

$P(x_j|\mathbf{Y})= \frac{p(\mathbf{Y}|x_j)P(x_j)}{P(\mathbf{Y})} \qquad\qquad\qquad\qquad (1)$

where

$P(\mathbf{Y})= \sum_{j=1}^c p(\mathbf{Y}|x_j)P(x_j) \qquad\qquad\qquad\qquad (2)$

Now, suppose we observe a particular feature space Y, and we decide to take an action k_i. If the state of nature is x_j, then from the definition of the loss function above we will incur the loss λ(k_i|x_j). Because P(x_j|Y) is the probability that the true state of nature is x_j, the loss associated with taking action k_i can be expressed as:

$R(k_i|\mathbf{Y})= \sum_{j=1}^c \lambda(k_i|x_j)P(x_j|\mathbf{Y} \qquad\qquad\qquad\qquad (3)$

@@ Line 9: / Line 9: @@
 *Introducing a loss function.
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Allowing the use of more than one feature just means that we would replace the scaler ''y'' with the feature vector '''Y''', where '''Y''' is in a d-dimensional Euclidean space <math>R^d</math>, called the feature space. Allowing more than two states of nature provides a useful generalization with small notational expense. Allowing more actions also opens up the possibility of rejection i.e refusing to make a decision in too close cases. This is can be very useful if being indecisive is not too costly. The Loss function states exactly how costly each action chosen is, and is used to convert a probability determination into a decision. Cost functions enables us to look at situations where certain errors are more costly than others, although we will often only be looking at cases where all errors are equally costly.
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Allowing the use of more than one feature just means that we would replace the scaler ''y'' with the ''feature vector'' '''Y''', where '''Y''' is in a d-dimensional Euclidean space <math>R^d</math>, called the ''feature space''. Allowing more than two states of nature provides a useful generalization with small notational expense. Allowing more actions also opens up the possibility of rejection i.e refusing to make a decision in too close cases. This is can be very useful if being indecisive is not too costly. The Loss function states exactly how costly each action chosen is, and is used to convert a probability determination into a decision. Cost functions enables us to look at situations where certain errors are more costly than others, although we will often only be looking at cases where all errors are equally costly.
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Putting this together, let {''x<sub>1</sub>'',...,''x<sub>c</sub>''} be the finite set of ''c'' states of nature and let {''k<sub>1</sub>'',...,''k<sub>a</sub>''} be the finite set of ''a'' possible actions. The loss function &lambda;(''k<sub>i</sub>''|''x<sub>j</sub>'') describes the loss incurred for taking action ''k<sub>i</sub>'' when the state of nature is ''x<sub>j</sub>''. Let '''Y''' be a d-component-vector-valued-RV, and let p('''Y'''|''x<sub>j</sub>'') be the conditional probability density function for '''Y''' with ''x<sub>j</sub>'' being the true state of nature. As discussed before, ''P''(''x<sub>j</sub>'') is the prior probability that nature is in state ''x<sub>j</sub>'', therefore by using Bayes formula we can find the posterior probability ''P''(''x<sub>j</sub>''|'''Y'''):
@@ Line 22: / Line 22: @@
 <div style="margin-left: 25em;">
 <math>P(\mathbf{Y})= \sum_{j=1}^c p(\mathbf{Y}|x_j)P(x_j) \qquad\qquad\qquad\qquad (2)</math>
+</div>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Now, suppose we observe a particular feature space '''Y''', and we decide to take an action ''k<sub>i</sub>''. If the state of nature is ''x<sub>j</sub>'', then from the definition of the loss function above we will incur the loss &lambda;(''k<sub>i</sub>''|''x<sub>j</sub>''). Because ''P''(''x<sub>j</sub>''|'''Y''') is the probability that the true state of nature is ''x<sub>j</sub>'', the loss associated with taking action ''k<sub>i</sub>'' can be expressed as:
+<div style="margin-left: 25em;">
+<math>R(k_i|\mathbf{Y})= \sum_{j=1}^c \lambda(k_i|x_j)P(x_j|\mathbf{Y} \qquad\qquad\qquad\qquad (3)</math>
 </div>

Difference between revisions of "Bayesian Decision Theory - Continuous Features" - Rhea

Revision as of 14:34, 1 March 2013

Bayes Decision Theory - Continuous Features

Alumni Liaison