%0 Journal Article
%T Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
%A TT Vu
%A UM Braga-Neto
%J EURASIP Journal on Bioinformatics and Systems Biology
%D 2009
%I BioMed Central
%R 10.1155/2009/158368
%X Randomized ensemble methods for classifier design combine the decision of an ensemble of classifiers designed on randomly perturbed versions of the available data [1每5]. The combination is often done by means of majority voting among the individual classifier decisions [4每6], whereas the data perturbation usually employs the bootstrap resampling approach, which corresponds to sampling uniformly with replacement from the original data [78]. The combination of bootstrap resampling and majority voting is known as bootstrap aggregation or bagging [45].There has been considerable interest recently in the application of bagging in the classification of both gene-expression data [9每12] and protein-abundance mass spectrometry data [13每18]. However, there is scant theoretical justification for the use of this heuristic, other than the expectation that combining the decision of several classifiers will regularize and improve the performance of unstable overfitting classification rules, such as unpruned decision trees, provided one uses a large enough number of classifiers in the ensemble [45]. It is also claimed that ensemble rules "do not overfit," meaning that classification error converges as the number of component classifiers tends to infinity [5].However, the main performance issue is not whether the ensemble scheme improves the classification error of a single unstable overfitting classifier, or whether its classification error converges to a fixed limit; these are important questions, which have been studied in the literature (in particular when the component classifiers are decision trees) [519每23], but the question of main practical interest is whether the ensemble scheme will improve the performance of unstable overfitting classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, particularly in small-sample settings. Therefore, there is a pressing need to examine rigorously the suitability and validity of the ensemble approach
%U http://bsb.eurasipjournals.com/content/2009/1/158368