Computing optimal hypotheses efficiently for boosting

Morishita Shinichi

Repository | Book | Chapter

(2002) Progress in discovery science, Dordrecht, Springer.

Computing optimal hypotheses efficiently for boosting

Shinichi Morishita

pp. 471-481

This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as singletest decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses effciently. In this paper, we consider several classes of simple but expressive hypotheses such as ranges and regions for numeric attributes, subsets of categorical values, and conjunctions of Boolean tests. For each class, we develop an efficient algorithm for choosing the optimal hypothesis.

Publication details

DOI: 10.1007/3-540-45884-0_35

Full citation:

Morishita, S. (2002)., Computing optimal hypotheses efficiently for boosting, in S. Arikawa & A. Shinohara (eds.), Progress in discovery science, Dordrecht, Springer, pp. 471-481.

This document is unfortunately not available for download at the moment.