|
BMC Bioinformatics 2012
ICGE: an R package for detecting relevant clusters and atypical units in gene expressionAbstract: ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use.We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.There is considerable interest among researches in using cluster methods. For example, a common approach in many biomedical applications is to seek a reliable and precise classification of genes into a number of clusters, which is essential for understanding the bases of complex diseases. For instance, an accurate classification of tumors is essential to successful diagnosis and treatment of cancer. Clustering algorithms attempt to partition the units into groups that have similar properties and it is necessary to identify the value of k at which the final partition appears to be the best. There is considerable interest among researches in using cluster methods, which can be generally found in R packages on the Comprehensive R Archive Network (CRAN, http://CRAN.R-project.org webcite). An important problem associated with the classification of units is to assess whether the clustering process finds a relevant partition, and to identify new classes of units. For example, if genes are classified into groups exhibiting similar patterns of gene expression variation, it is necessary to pay attention to two things. First the correct classification in k clusters of the genes by an unsupervised method. Usually, when a clustering algorithm is applied to a set of units, although the data do not present a cluster structure, the algorithm
|