Latest revision as of 23:32, 28 February 2013

Bayes Decision Theory - Introduction

The Bayesian decision theory is a valuable approach to solve a pattern classification problem. It is based on quantifying the tradeoffs between various classification decisions using the probability of events occurring and the costs that accompany the decisions. Here, we are assuming that the problems are posed in probabilistic terms and all relevant probability values are known (It is important to note that in reality its not always like this).

Consider a situation where we have a stack of cards where each card is either a diamond or spade. We can denote x = x₁ for diamonds, and x = x₂ for spades. Suppose we want to design a system that will be able to predict the shape on the next card that comes up. We also know the prior probability P(x₁) that the next card is diamonds, and some prior probability P(x₂) that it is spades, and both probabilities sum up to 1 (since we only have two variables). We can therefore use the following decision rule; that if P(x₁) > P(x₂), then the card is diamonds, otherwise it is spades. How well that works will depend on how much greater P(x₁) is. If it is much greater than P(x₂) then our decision will favor diamonds most of the time, however if P(x₁) = P(x₂) then we have only a 50% chance of being correct.

In most cases however, we wont be making decisions with so little information. For example if we had information about the value of color of the shapes on the cards (the value of a color refers to the degree of lightness and darkness of a color), we can describe this as a variable y and we consider y to be a random variable whose distribution depends on the state of the card and is expressed as p(y|x). This is called the class-conditional probability density function, and it is defined as the probability of y given that the state is x. The equation for the conditional probability is given as:

$P(y|x)= \frac{P(xy)}{P(x)} \qquad\qquad\qquad\qquad\qquad (1)$

and

$P(x|y)= \frac{P(xy)}{P(y)} \qquad\qquad\qquad\qquad\qquad (2)$

The difference between p(y|x₁) and p(y|x₂) describes the difference in color values between the diamonds and spades in a stack of cards. Suppose we know both the prior probability P(x_j) and the conditional probability p(y|x_j) for j = 1,2. If we also measure the color values for the card as y, we can rearrange the equations 1 and 2 to come up with Bayes formula which is:

$P(x_j|y)= \frac{p(y|x_j)p(x_j)}{P(y)} \qquad\qquad\qquad\qquad (3)$

the formula above can be expressed as

$result = \frac{likelihood\ *\ prior}{evidence}$

Bayes formula shows that by knowing the value of y, we can get the probability of x_j given that the feature value y has been measured. If we run an observation of y and we get P(x₁|y) is greater than P(x₂|y), we choose diamonds, and conversely if P(x₂|y) is greater than P(x₁|y) we choose spades. To justify this process, we can also calculate the probability of error when we make a decision. Whenever we observe a particular y, the probability of error is

$P(error|y)= \begin{cases} P(x_1|y) \qquad if\ we\ decide\ x_2\\ P(x_2|y) \qquad if\ we\ decide\ x_1 \end{cases}$

Clearly for a given value of y, we can minimize erroe that P(error|y) is as small as possible. Where P(error|y) can be rewritten as:

$$ P(error|y)= min[P(x_1|y),P(x_2|y)] $$

In general both the prior probability and conditional probability given the value of an extra feature to improve our classifier are very important in making decisions, and Bayes theorem combines them to achieve the minimum probability of error in the decision making process.

@@ Line 1: / Line 1: @@
-= Bayes Decision Theory  =
+= Bayes Decision Theory - Introduction  =
 ----
@@ Line 6: / Line 6: @@
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The Bayesian decision theory is a valuable approach to solve a pattern classification problem. It is based on quantifying the tradeoffs between various classification decisions using the probability of events occurring and the costs that accompany the decisions. Here, we are assuming that the problems are posed in probabilistic terms and all relevant probability values are known (It is important to note that in reality its not always like this).
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Consider a situation where we have a stack of cards where each card is either a diamond or spade. We can denote ''x'' = ''x<sub>1</sub>'' for diamonds, and ''x'' = ''x<sub>2</sub>'' for spades. Suppose we want to design a system that will be able to  predict the next card that will come up. We also know the prior probability ''P(x<sub>1</sub>)'' that the next card is diamonds, and some prior probability ''P(x<sub>2</sub>)'' that it is spades, and both probabilities sum up to 1 (since we only have two variables). We can therefore use the following decision rule :that if ''P(x<sub>1</sub>)'' > ''P(x<sub>2</sub>)'', then the card is diamonds, otherwise it is spades. How well that works will depend on how much greater ''P(x<sub>1</sub>)'' is. If it is much greater than ''P(x<sub>2</sub>)'' then  our decision will favor diamonds most of the time, however if ''P(x<sub>1</sub>)'' = ''P(x<sub>2</sub>)'' then we have only a 50% chance of being correct.
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Consider a situation where we have a stack of cards where each card is either a diamond or spade. We can denote ''x'' = ''x<sub>1</sub>'' for diamonds, and ''x'' = ''x<sub>2</sub>'' for spades. Suppose we want to design a system that will be able to  predict the shape on the next card that comes up. We also know the prior probability ''P(x<sub>1</sub>)'' that the next card is diamonds, and some prior probability ''P(x<sub>2</sub>)'' that it is spades, and both probabilities sum up to 1 (since we only have two variables). We can therefore use the following decision rule; that if ''P(x<sub>1</sub>)'' > ''P(x<sub>2</sub>)'', then the card is diamonds, otherwise it is spades. How well that works will depend on how much greater ''P(x<sub>1</sub>)'' is. If it is much greater than ''P(x<sub>2</sub>)'' then  our decision will favor diamonds most of the time, however if ''P(x<sub>1</sub>)'' = ''P(x<sub>2</sub>)'' then we have only a 50% chance of being correct.
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; However, in most cases we wont be making decisions with so little information. For example if we had information about the value of color of the shapes on the cards (the value of a color refers to the degree of lightness and darkness of a color), we can describe this as a variable ''y'' and we consider ''y'' to be a random variable whose distribution depends on the state of the card and is expressed as ''p(y|x)''. This is called the ''class-conditional probability density'' function, and it is defined as the probability of ''y'' given that the state is ''x''. The equation for the conditional probability is given as:
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; In most cases however, we wont be making decisions with so little information. For example if we had information about the value of color of the shapes on the cards (the value of a color refers to the degree of lightness and darkness of a color), we can describe this as a variable ''y'' and we consider ''y'' to be a random variable whose distribution depends on the state of the card and is expressed as ''p(y|x)''. This is called the ''class-conditional probability density'' function, and it is defined as the probability of ''y'' given that the state is ''x''. The equation for the conditional probability is given as:
 <div style="margin-left: 25em;">
@@ Line 20: / Line 20: @@
 </div>
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The difference between ''p(y|x<sub>1</sub>)'' and ''p(y|x<sub>2</sub>)'' describes the difference in color values between the diamonds and spades in a stark of cards. Suppose we know both the prior probability ''P(x<sub>j</sub>)'' and the conditional probability ''p(y|x<sub>j</sub>)'' for ''j'' = 1,2. If we also measure the color values for the card as ''y'', we can rearrange the equations 1 and 2 to come up with ''Bayes formula'' which is:
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The difference between ''p(y|x<sub>1</sub>)'' and ''p(y|x<sub>2</sub>)'' describes the difference in color values between the diamonds and spades in a stack of cards. Suppose we know both the prior probability ''P(x<sub>j</sub>)'' and the conditional probability ''p(y|x<sub>j</sub>)'' for ''j'' = 1,2. If we also measure the color values for the card as ''y'', we can rearrange the equations 1 and 2 to come up with ''Bayes formula'' which is:
 <div style="margin-left: 25em;">
@@ Line 33: / Line 33: @@
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Bayes formula shows that by knowing the value of ''y'' we can get the probability of x<sub>j</sub> given that the feature value ''y'' has been measured. If we run an observation of ''y'' and we get ''P(x<sub>1</sub>|y)'' is greater than ''P(x<sub>2</sub>|y)'', we choose diamonds, and conversely if ''P(x<sub>2</sub>|y)'' is greater than ''P(x<sub>1</sub>|y)'' we choose spades. To justify this process, we can also calculate the probability of error when we make a decision. Whenever we observe a particular ''y'', the probability of error is
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Bayes formula shows that by knowing the value of ''y'', we can get the probability of x<sub>j</sub> given that the feature value ''y'' has been measured. If we run an observation of ''y'' and we get ''P(x<sub>1</sub>|y)'' is greater than ''P(x<sub>2</sub>|y)'', we choose diamonds, and conversely if ''P(x<sub>2</sub>|y)'' is greater than ''P(x<sub>1</sub>|y)'' we choose spades. To justify this process, we can also calculate the probability of error when we make a decision. Whenever we observe a particular ''y'', the probability of error is
 <div style="margin-left: 20em;">
@@ Line 39: / Line 39: @@
 P(error|y)=
 \begin{cases}
-P(x_1|y) \qquad if\ we\ decide\ x_2
+P(x_1|y) \qquad if\ we\ decide\ x_2\\
 P(x_2|y) \qquad if\ we\ decide\ x_1
 \end{cases}
@@ Line 45: / Line 45: @@
 </div>
-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Clearly for a given value of y, we can minimize the probablity
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Clearly for a given value of y, we can minimize erroe that ''P(error|y)'' is as small as possible. Where ''P(error|y)'' can be rewritten as:
+<div style="margin-left: 25em;">
+<math>
+P(error|y)= min[P(x_1|y),P(x_2|y)]
+</math>
+</div>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; In general both the prior probability and conditional probability given the value of an extra feature to improve our classifier are very important in making decisions, and Bayes theorem combines them to achieve the minimum probability of error in the decision making process.
+----
+*[[Honors_project_1_ECE302S12|Back to Tosin's Honors Project]]
+*[[2013 Spring ECE 302 Boutin|Back to ECE 302 Spring 2013. Prof Boutin]]
+*[[ECE302|Back to ECE 302]]
 [[Category:Honors_project]] [[Category:ECE302]] [[Category:Pattern_recognition]]

Difference between revisions of "Bayesian Decision Theory" - Rhea

Latest revision as of 23:32, 28 February 2013

Bayes Decision Theory - Introduction

Alumni Liaison