Line 7: Line 7:
 
In this experiment, we decided to test the Bayes decision rule on two <span class="texhtml">2 − ''D''</span> classes with severe imbalance between them. Prior probabilities for both classes were defined as <span class="texhtml">''P''(ω<sub>1</sub>) = 0.99999</span>&nbsp;and&nbsp;<math>P(\omega_2)=1 \times 10^{-5}</math>. The mean of one feature of class <span class="texhtml">ω<sub>2</sub></span>&nbsp;was changed and then the error rate computed. This was repeated for different values of the mean.  
 
In this experiment, we decided to test the Bayes decision rule on two <span class="texhtml">2 − ''D''</span> classes with severe imbalance between them. Prior probabilities for both classes were defined as <span class="texhtml">''P''(ω<sub>1</sub>) = 0.99999</span>&nbsp;and&nbsp;<math>P(\omega_2)=1 \times 10^{-5}</math>. The mean of one feature of class <span class="texhtml">ω<sub>2</sub></span>&nbsp;was changed and then the error rate computed. This was repeated for different values of the mean.  
  
 
+
<br>
  
 
Looking at the results, the class imbalance makes it more difficult to accurately determine the correct class of a particular event or object. In Figure 6, the mean of feature in class is μ2a = 8 so classes are highly uncorrelated. Problem with class imbalance is shown in Figure 7, where μ2a = 3 and classifier incorrectly selects a class ω2 object as class ω1 . As there are significantly fewer occurances of class ω2 than ω1 in the universe under consideration, mistakes are most costly. Also, to correctly classify each of these occurances, one might be tempted to increase the sensitivity of the classifier which would (unfortunately) also increase the number of false positive cases. At the end, this problem is not caused by the Bayesian decision rule. But one as a researcher should be aware that additional techniques should be considered to effectively classify objects under the class imbalance scenario.  
 
Looking at the results, the class imbalance makes it more difficult to accurately determine the correct class of a particular event or object. In Figure 6, the mean of feature in class is μ2a = 8 so classes are highly uncorrelated. Problem with class imbalance is shown in Figure 7, where μ2a = 3 and classifier incorrectly selects a class ω2 object as class ω1 . As there are significantly fewer occurances of class ω2 than ω1 in the universe under consideration, mistakes are most costly. Also, to correctly classify each of these occurances, one might be tempted to increase the sensitivity of the classifier which would (unfortunately) also increase the number of false positive cases. At the end, this problem is not caused by the Bayesian decision rule. But one as a researcher should be aware that additional techniques should be considered to effectively classify objects under the class imbalance scenario.  
Line 15: Line 15:
 
== References  ==
 
== References  ==
  
[[|[1]]] S. Axelsson: The base-rate fallacy and the difficulty of intrusion detection. In: ACM Trans. Inf. Syst. Secur., pp. 186–205. ACM, New York, 2000.  
+
[1] S. Axelsson: The base-rate fallacy and the difficulty of intrusion detection. In: ACM Trans. Inf. Syst. Secur., pp. 186–205. ACM, New York, 2000.  
  
[[|[2]]] C. Drummond and R. Holte: Severe Class Imbalance: Why Better Algorithms Aren’t the Answer. In: J. Gama et al. (Eds.): ECML 2005, LNAI 3720, pp. 539–546. Springer-Verlag, Berlin Heidelberg, 2005.
+
[2] C. Drummond and R. Holte: Severe Class Imbalance: Why Better Algorithms Aren’t the Answer. In: J. Gama et al. (Eds.): ECML 2005, LNAI 3720, pp. 539–546. Springer-Verlag, Berlin Heidelberg, 2005.

Revision as of 09:44, 13 April 2010

Bayes decision rule is a simple, intuitive and powerful classifier. It allows to select the most likely class, given the information gathered. Here also lies an important limitation of this classifier, as relies heavily on the probability values associated to each class. In many real world scenarios, this might not be completely possible requires.

An interesting experiment to test Bayes decision rule involves an scenario faced in my area of research, intrusion detection (in the context of computer systems). In this scenario, there exists a severe class imbalance between what is considered normal and malicious traffic (attacks). The prior probabilities of the classes can be several orders of magnitude different between them. Also, the problem worsens as the inter-class features can be highly correlated. Attackers try to hide behind normal traffic, therefore their malicious traffic correlates to the normal traffic.

Further information on this problem, known as the Bayes-Rate Fallacy, related to the intrusion detection field, is provided in [1]. Nevertheless, this problem is not exclusive to the computer security field, and a more general discussion on the class imbalance problem has been given by the statistical pattern recognition community, as shown in [2].

In this experiment, we decided to test the Bayes decision rule on two 2 − D classes with severe imbalance between them. Prior probabilities for both classes were defined as P1) = 0.99999 and $ P(\omega_2)=1 \times 10^{-5} $. The mean of one feature of class ω2 was changed and then the error rate computed. This was repeated for different values of the mean.


Looking at the results, the class imbalance makes it more difficult to accurately determine the correct class of a particular event or object. In Figure 6, the mean of feature in class is μ2a = 8 so classes are highly uncorrelated. Problem with class imbalance is shown in Figure 7, where μ2a = 3 and classifier incorrectly selects a class ω2 object as class ω1 . As there are significantly fewer occurances of class ω2 than ω1 in the universe under consideration, mistakes are most costly. Also, to correctly classify each of these occurances, one might be tempted to increase the sensitivity of the classifier which would (unfortunately) also increase the number of false positive cases. At the end, this problem is not caused by the Bayesian decision rule. But one as a researcher should be aware that additional techniques should be considered to effectively classify objects under the class imbalance scenario.

--Gmodeloh 13:32, 13 April 2010 (UTC)

References

[1] S. Axelsson: The base-rate fallacy and the difficulty of intrusion detection. In: ACM Trans. Inf. Syst. Secur., pp. 186–205. ACM, New York, 2000.

[2] C. Drummond and R. Holte: Severe Class Imbalance: Why Better Algorithms Aren’t the Answer. In: J. Gama et al. (Eds.): ECML 2005, LNAI 3720, pp. 539–546. Springer-Verlag, Berlin Heidelberg, 2005.

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett