全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Efficient Data Clustering Algorithms: Improvements over Kmeans

Keywords: Data Clustering , Random Initialization , Kmeans , K-Nearest Neighbor , Density , Noise , Outlier , Data Point

Full-Text   Cite this paper   Add to My Lib

Abstract:

This paper presents a new approach to overcome one of the most known disadvantages of the well-known Kmeans clustering algorithm. The problems of classical Kmeans are such as the problem of random initialization of prototypes and the requirement of predefined number of clusters in the dataset. Randomly initialized prototypes can often yield results to converge to local rather than global optimum. A better result of Kmeans may be obtained by running it many times to get satisfactory results. The proposed algorithms are based on a new novel definition of densities of data points which is based on the k-nearest neighbor method. By this definition we detect noise and outliers which affect Kmeans strongly, and obtained good initial prototypes from one run with automatic determination of K number of clusters. This algorithm is referred to as Efficient Initialization of Kmeans (EI-Kmeans). Still Kmeans algorithm used to cluster data with convex shapes, similar sizes, and densities. Thus we develop a new clustering algorithm called Efficient Data Clustering Algorithm (EDCA) that uses our new definition of densities of data points. The results show that the proposed algorithms improve the data clustering by Kmeans. EDCA is able to detect clusters with different non-convex shapes, different sizes and densities.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413