OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Hans Journal of Data Mining 2022

基于机器学习的K-Means聚类优化算法研究
Research on K-Means Clustering Optimization Algorithm Based on Machine Learning

DOI: 10.12677/HJDM.2022.121003, PP. 20-26

李贞, 刘海燕, 刘策, 李庆钰, 刘刚

Keywords: 改进K-Means算法，Mini Batch K-Means算法，数据挖掘
Improved K-Means Algorithm, Mini Batch K-Means Algorithm, Data Mining

Full-Text Cite this paper Add to My Lib

Abstract:

K均值聚类(K-Means)算法是基于划分的聚类算法中的一个典型算法，是机器学习研究算法的基础。通过将相似的样本自动归到一个类别，合理地确定K值和K个初始类簇中心点，使聚类效果更好。经过适当的预处理，可以对数据做初步分析，甚至挖掘出隐含的价值信息。相比于SVM、GBDT等机器学习算法，具有操作简单、采用误差平方和准则函数、对大数据集处理上有较高的伸缩性和可压缩性的优点。但是，这种聚类算法仍然存在随机初始聚类中心导致算法不稳定、K值的选取不好把握、非凸性数据集非常难收敛等问题。为提升数据挖掘中聚类分析的效果，本文在分析数据挖掘、聚类分析、传统K-Means算法的基础上，提出一种改进的K-Means算法，经过实验证明，K-Means的改进算法可以有效地提高簇的质量，以及算法的效率和稳定性，使其提供更加精准有效的服务，并且减少了算法开销。
K-Means Clustering (K-Means) algorithm is a typical algorithm based on the clustering algorithm of division, which is the basis of the machine learning research algorithm. By automatically categoriz-ing similar samples into one category, the K value and K initial cluster center points can be deter-mined reasonably to make the clustering effect better. After proper pre-processing, the data can be analyzed and even the implied value information can be excavated. Compared with machine learn-ing algorithms such as SVM and GBDT, it has the advantages of simple operation, the use of error square and standard functions, and the high flexibility and compressibility of large data sets. How-ever, this clustering algorithm still has the problems such as random initial clustering center lead-ing to algorithm instability, poor grasp of K value selection and non-convex data set is very difficult to converge. In order to improve the effect of clustering analysis in data mining, this paper puts forward an improved K-Means algorithm on the basis of analyzing data mining, clustering analysis, and the traditional K-Means algorithm. Experiments have proved that the improved K-Means algo-rithm can effectively improve the quality of clusters as well as the efficiency and stability of the al-gorithm; and make it provide more accurate and effective service, and reduce the algorithm over-head.

References

[1]	钱鑫, 张龙波, 田爱奎, 邓齐志, 汪金苗. 一种面向数据密集型计算环境的聚类算法[J]. 济南大学学报(自然科学版), 2013(1): 11-15.
[2]	Idrees, A.K., Al-Qurabat, A., Jaoude, C.A., et al. (2019) Integrated Divide and Conquer with Enhanced K-Means Technique for Energy-Saving Data Aggregation in Wireless Sensor Networks., The 15th Interna-tional Wireless Communications & Mobile Computing Conference (IWCMC 2019), 2019, 973-978. https://doi.org/10.1109/IWCMC.2019.8766784
[3]	夏长辉. 一种改进的K-Means聚类算法[J]. 信息与电脑, 2017(14): 40-42.
[4]	钮永莉, 武斌. 基于改进粒子群和K-Means的文本聚类算法研究[J]. 兰州文理学院学报(自然科学版), 2019, 33(4): 44-47.
[5]	杨丹, 朱世玲, 卞正宇. 基于改进的K-Means算法在文本挖掘中的应用[J]. 计算机技术与发展, 2019, 29(4): 68-71.
[6]	王康. K-Means聚类算法的改进研究及其应用[D]: [硕士学位论文]. 大连: 大连理工大学, 2015.
[7]	Nayak, S., Panda, C., Xalxo, Z., et al. (2015) An Integrated Clustering Framework Using Optimized K-Means with Firefly and Canopies. Computational Intelligence in Data Mining, 2, 333-343. https://doi.org/10.1007/978-81-322-2208-8_31
[8]	Yin, J.W., Chen, J.M., Xue, B.L., et al. (2013) An Enhancing K-Means Algorithm Based on Sorting and Partition. International Journal of Database Theory and Application, 22, 387-408.
[9]	Whang, Y. and Cui, P. (2017) An Efficient K-Means Parallel Algorithm Based on MapReduce. Journal of Liaoning Technical University (Natural Science Edition), 36, 1204-1211.
[10]	韩存鸽, 刘长勇. 一种改进的K-Means算法[J]. 闽江学院学报, 2019, 40(5): 49-54+90.
[11]	韩琮师, 张高毓, 张熙, 等. 基于改进的K-Means算法在套餐精准营销中的研究[J]. 信息技术与信息化, 2021(5): 132-133.
[12]	刘文佳, 张骏. 一种改进的K-Means聚类算法[J]. 现代商贸工业, 2018(19): 196-198.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于机器学习的K-Means聚类优化算法研究Research on K-Means Clustering Optimization Algorithm Based on Machine Learning

基于机器学习的K-Means聚类优化算法研究
Research on K-Means Clustering Optimization Algorithm Based on Machine Learning