OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

ISRN Computational Mathematics 2013

Evolutionary Algorithms for Robust Density-Based Data Clustering

DOI: 10.1155/2013/931019

Amit Banerjee

Full-Text Cite this paper Add to My Lib

Abstract:

Density-based clustering methods are known to be robust against outliers in data; however, they are sensitive to user-specified parameters, the selection of which is not trivial. Moreover, relational data clustering is an area that has received considerably less attention than object data clustering. In this paper, two approaches to robust density-based clustering for relational data using evolutionary computation are investigated. 1. Introduction Clustering as an integral machine learning activity involves unsupervised classification of data into self-similar clusters—entities in a cluster are alike and entities across clusters are not. A cluster is defined in terms of internal homogeneity and external separation, or in density-related terms, clusters are dense regions in feature space with sparser regions separating clusters from one another. Datasets themselves can be divided into two groups—object data and relational data; the distinction that is described later in the paper. While a lot of effort and research have gone into developing clustering algorithms for object data, data clustering methods for relational data have received lesser attention. In application domains such as social sciences and bioinformatics, relational datasets are more common than object data. Prototype-based clustering algorithms are popular for clustering object data, where a cluster can be represented by a cluster prototype, and algorithms are built around optimization parameters of the prototypes. Most optimization is done iteratively from a randomly chosen initial state, and as it turns out, prototype-based object clustering is very sensitive to this initialization. Evolutionary algorithms and other approaches that operate on a population of potential solutions have lately been used as a remedy to the curse of initialization. Real-life data is also inherently noisy, and prototype-based clustering methods have been shown to be adversely affected by noise in data. Unless guarded against, the presence of outliers in data influences the calculation of prototype parameters. Density-based clustering algorithms are resistant to outliers if it can be assumed that outliers occupy the less-dense regions in the feature space. Density-based spatial clustering of applications with noise (DBSCAN) is the most popular density-based clustering algorithm [1]. Resistant to outliers and easily adapted to large-scale data clustering, DBSCAN and its variants still suffer from the problem of pre-specification of two important parameters, described later, which in practice is not always

References

[1]	M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD '96), AAAI Press, New York, NY, USA, 1996.
[2]	R. J. Hathaway and J. C. Bezdek, “Nerf c-means: non-Euclidean relational fuzzy clustering,” Pattern Recognition, vol. 27, no. 3, pp. 429–437, 1994.
[3]	R. J. Hathaway, J. W. Davenport, and J. C. Bezdek, “Relational duals of the c-means clustering algorithms,” Pattern Recognition, vol. 22, no. 2, pp. 205–212, 1989.
[4]	R. N. Davé and S. Sen, “Robust fuzzy clustering of relational data,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, pp. 713–727, 2002.
[5]	E. Falkenauer, Genetic Algorithms and Grouping Problems, John Wiley & Sons, Chichester, UK, 1998.
[6]	E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. de Carvalho, “A survey of evolutionary algorithms for clustering,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 39, no. 2, pp. 133–155, 2009.
[7]	O. Nasraoui and R. Krishnapuram, “Clustering using a genetic fuzzy least median of squares algorithm,” in Proceedings of the 16th Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '97), pp. 217–221, September 1997.
[8]	A. Banerjee, “Robust fuzzy clustering as a multi-objective optimization procedure,” in Proceedings of the 28th Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '09), June 2009.
[9]	A. Banerjee, “An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters,” in Proceedings of the 29th Annual North American Fuzzy Information Processing Society Conference (NAFIPS '10), July 2010.
[10]	N. R. Pal, V. K. Eluri, and G. K. Mandal, “Fuzzy logic approaches to structure preserving dimensionality reduction,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 3, pp. 277–286, 2002.
[11]	T. A. Runkler, “Fuzzy nonlinear projection,” in Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '03), pp. 863–868, May 2003.
[12]	D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp. 224–227, 1978.
[13]	A. Tucker, J. Crampton, and S. Swift, “RGFGA: an efficient representation and crossover for grouping genetic algorithms,” Evolutionary Computation, vol. 13, no. 4, pp. 477–499, 2005.
[14]	M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure,” in Proceedings of the SIGMOD International Conference on Management of Data, pp. 19–40, 1999.
[15]	X. Xu, M. Ester, H. P. Kriegel, and J. Sander, “Distribution-based clustering algorithm for mining in large spatial databases,” in Proceedings of the 14th International Conference on Data Engineering, pp. 324–331, February 1998.
[16]	C. J. Merz and P. M. Murphy, “UCI Repository of Machine Learning Databases,” University of California, Irvine, Calif, USA, 2007, http://www.ics.uci.edu/~mlearn.
[17]	E. R. Hruschka and N. F. Ebecken, “A genetic algorithm for cluster analysis,” Intelligent Data Analysis, vol. 7, pp. 15–25, 2003.
[18]	A. Banerjee and S. J. Louis, “A genetic algorithm implementation of the fuzzy least trimmed squares clustering,” in Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '07), July 2007.
[19]	R. Kothari and D. Pitts, “On finding the number of clusters,” Pattern Recognition Letters, vol. 20, no. 4, pp. 405–416, 1999.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133