%0 Journal Article %T 基于词频分析的K-Means特征聚类算法的《红楼梦》作者分析
Analysis of the Author of A Dream of Red Mansions Based on K-Means Feature Clustering Algorithm with Word Frequency %A 郑佳莉 %A 柯小玲 %A 江晓莹 %A 陈淑悦 %J Hans Journal of Data Mining %P 73-79 %@ 2163-1468 %D 2022 %I Hans Publishing %R 10.12677/HJDM.2022.121008 %X 本文提出一种“基于词频分析的K-means特征聚类算法”来分析存疑文献的作者信息。以《红楼梦》为例,根据在前80回和后40回中确定的特征汉字的出现频率,用基于词频分析的K-means特征聚类算法对其分析。以每10回为一个文本,研究前、中、后四十回的相似度,从而得出《红楼梦》的前八十回与后四十回很可能并非一人所作的论断。
In this paper, “a K-means feature clustering algorithm based on word frequency analysis” is pro-posed to analyze the author information of doubtful documents. Taking A Dream of Red Mansions as an example, the K-means feature clustering algorithm based on word frequency is used to analyze it according to the occurrence frequency of characteristic Chinese characters determined in the first 80 chapters and the last 40 chapters. Taking every 10 chapters as a text, by studying the similarity of the first, middle and last 40 chapters, it is concluded that the first 80 chapters and the last 40 chapters of A Dream of Red Mansions are probably not made by one person. %K 词频,K-Means特征聚类算法,相似度
Word Frequency %K K-Means Feature Clustering Algorithm %K Similarity %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=48385