%0 Journal Article %T A comparison of machine learning techniques for survival prediction in breast cancer %A Leonardo Vanneschi %A Antonella Farinaccio %A Giancarlo Mauri %A Mauro Antoniotti %A Paolo Provero %A Mario Giacobini %J BioData Mining %D 2011 %I BioMed Central %R 10.1186/1756-0381-4-12 %X We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection.Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.Current cancer therapies have serious side effects: ideally type and dosage of the therapy should be matched to each individual patient based on his/her risk of relapse. Therefore the classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. Until recently patient classification was based on a series of clinical and histological parameters. The advent of high-throughput techniques to measure gene expression led in the last decade to a large body of research on gene expression in cancer, and in particular on the possibility of using gene expression data to improve patient classification. A gene signature is a set of genes whose levels of expression can be used to predict a biological state (see [1]): in the case of cancer, gene signatures have been developed both to distinguish cancerous from non-cancerous conditions and to classify cancer patients based on the aggressiveness of the tumor, as measured for example by the probability of relapsing within a given time.While many studies have been devoted to the identification of gene signatures in various types of cancer, the question of the algorithms to be used to maximize the predictive power of a gene signature has re %U http://www.biodatamining.org/content/4/1/12