%0 Journal Article
%T Evolutionary Algorithms for Robust Density-Based Data Clustering
%A Amit Banerjee
%J ISRN Computational Mathematics
%D 2013
%R 10.1155/2013/931019
%X Density-based clustering methods are known to be robust against outliers in data; however, they are sensitive to user-specified parameters, the selection of which is not trivial. Moreover, relational data clustering is an area that has received considerably less attention than object data clustering. In this paper, two approaches to robust density-based clustering for relational data using evolutionary computation are investigated. 1. Introduction Clustering as an integral machine learning activity involves unsupervised classification of data into self-similar clusters—entities in a cluster are alike and entities across clusters are not. A cluster is defined in terms of internal homogeneity and external separation, or in density-related terms, clusters are dense regions in feature space with sparser regions separating clusters from one another. Datasets themselves can be divided into two groups—object data and relational data; the distinction that is described later in the paper. While a lot of effort and research have gone into developing clustering algorithms for object data, data clustering methods for relational data have received lesser attention. In application domains such as social sciences and bioinformatics, relational datasets are more common than object data. Prototype-based clustering algorithms are popular for clustering object data, where a cluster can be represented by a cluster prototype, and algorithms are built around optimization parameters of the prototypes. Most optimization is done iteratively from a randomly chosen initial state, and as it turns out, prototype-based object clustering is very sensitive to this initialization. Evolutionary algorithms and other approaches that operate on a population of potential solutions have lately been used as a remedy to the curse of initialization. Real-life data is also inherently noisy, and prototype-based clustering methods have been shown to be adversely affected by noise in data. Unless guarded against, the presence of outliers in data influences the calculation of prototype parameters. Density-based clustering algorithms are resistant to outliers if it can be assumed that outliers occupy the less-dense regions in the feature space. Density-based spatial clustering of applications with noise (DBSCAN) is the most popular density-based clustering algorithm [1]. Resistant to outliers and easily adapted to large-scale data clustering, DBSCAN and its variants still suffer from the problem of pre-specification of two important parameters, described later, which in practice is not always
%U http://www.hindawi.com/journals/isrn.computational.mathematics/2013/931019/