%0 Journal Article %T Levenshtein算法优化及在题库判重中的应用 %A 张衡 %A 陈良育 %J 华东师范大学学报(自然科学版) %D 2018 %R 10.3969/j.issn.1000-5641.2018.05.013 %X 摘要 为了解决Levenshtein距离算法在长文本和大规模匹配效率的不足,本文针对Levenshtein距离算法提出一种提前终止的优化策略.首先根据Levenshtein距离矩阵中元素内在的联系,归纳总结出一个递推关系式.再依据此递推关系式,提出一种提前终止策略,可提前判断两个文本是否满足预先设定的相似度阈值.经过多个学科题库判重实验的佐证,本文的提前终止策略能显著减少计算时间.</br>Abstract:In order to overcome the disadvantages of the Levenshtein distance algorithm for long text and large-scale matching, we propose an early termination strategy for the Levenshtein distance algorithm. Firstly, according to the intrinsic relationship between elements in the Levenshtein distance matrix, we sum up a recurrence relation. Based on this relation, an early termination strategy is proposed to determine early-on whether two texts satisfy the predefined similarity threshold. Through several tests on different subjects, it is demonstrated that the early termination strategy can significantly reduce calculation time. %K 题库匹配 %K 文本相似度 %K Levenshtein编辑距离< %K /br> %K Key words: bank match text similarity Levenshtein edit distance %U http://xblk.ecnu.edu.cn/CN/abstract/abstract25556.shtml