|
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual AnalyticsDOI: 10.1155/2010/746021 Abstract: Clustering delineates operation for objects within a dataset having similar qualities into homogeneous groups [1]. It allows for the discovery of similarities and differences among patterns in order to derive useful conclusions about them [2]. Determining the structure or patterns within data is a significant component in classifying and visualizing, which allows for geospatial mining of high-volume datasets. While there are many clustering techniques that have been developed over the years (many of which have been improvements and others have been revisions), the most common and flexible clustering technique is the k-means clustering technique [3]. The primary function of the k-means algorithm is to partition data into kdisjoint subgroups, and then the quality of these clusters is measured via different validation methods. The original k-means method, however, is reputable for being feeble in three major areas: () computationally expensive for large-scale datasets; () cluster initialization a priori; and () local minima search problem [4, 5].The first report to resolve these concerns about the k-means clustering technique was published as a book chapter [6]. In this paper, we have analyzed three distinct datasets and also make additional improvements in the implementation of the algorithm. Postprocessing work on discovered clusters involved a detailed component of fieldwork for one of the experimental datasets revealing key implications for disease mechanism discovery. This paper is inspired by an increasing demand for better visual exploration and data mining tools that function efficiently in data-rich and computationally rich environments. Clustering techniques have played a significant role to advance knowledge derived from such environments. Besides, they have been applied to several different areas of study, including, but not limited to, gene expression data [7, 8], georeferencing of biomedical data to support disease informatics research [9, 10] in terms of
|