%0 Journal Article
%T Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
%A Singh Vijendra
%A Sahoo Laxman
%J Applied Computational Intelligence and Soft Computing
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/863146
%X Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. In this paper, we have presented a robust multi objective subspace clustering (MOSCL) algorithm for the challenging problem of high-dimensional clustering. The first phase of MOSCL performs subspace relevance analysis by detecting dense and sparse regions with their locations in data set. After detection of dense regions it eliminates outliers. MOSCL discovers subspaces in dense regions of data set and produces subspace clusters. In thorough experiments on synthetic and real-world data sets, we demonstrate that MOSCL for subspace clustering is superior to PROCLUS clustering algorithm. Additionally we investigate the effects of first phase for detecting dense regions on the results of subspace clustering. Our results indicate that removing outliers improves the accuracy of subspace clustering. The clustering results are validated by clustering error (CE) distance on various data sets. MOSCL can discover the clusters in all subspaces with high quality, and the efficiency of MOSCL outperforms PROCLUS. 1. Introduction Clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. The task of clustering has been studied in statistics [1], machine learning [2每4], bioinformatics [3, 5每7], and more recently in databases [8每10]. Clustering algorithms finds a partition of the points such that points within a cluster are more similar to each other than to points in different clusters [11]. In traditional clustering each dimension is equally weighted when computing the distance between points. Most of these algorithms perform well in clustering low-dimensional data sets [12每15]. However, in higher dimensional feature spaces, their performance and efficiency deteriorate to a greater extent due to the high dimensionality [16]. Another difficulty we have to face when dealing with clustering is the dimensionality of data. In the clustering task, the overwhelming problem of high dimensionality presents a dual aspect. First, the presence of irrelevant attributes eliminates any hope on clustering tendency, because such features cause the algorithm to search for clusters where there is no existence of clusters. This also happens with low-dimensional data, but the likelihood of presence of irrelevant features and their number grow with dimension. The second
%U http://www.hindawi.com/journals/acisc/2013/863146/