%0 Journal Article %T 一种节奏与内容解纠缠的语音克隆模型
A Voice Cloning Model for Rhythm and Content De-Entanglement %A 王萌 %A 姜丹 %A 曹少中 %J Artificial Intelligence and Robotics Research %P 166-176 %@ 2326-3423 %D 2024 %I Hans Publishing %R 10.12677/AIRR.2024.131018 %X 语音克隆是一种通过语音分析、说话人分类和语音编码等算法合成与参考语音非常相似的语音技术。为了增强说话人个人发音特征转移情况,提出了节奏与内容解纠缠的MRCD模型。通过节奏随机扰动模块的随机阈值重采样将语音信号所传递的节奏信息解纠缠,使语音节奏相互独立;利用梅尔内容增强模块获取说话人的相似发言特征内容,同时增加风格损失函数及循环一致性损失函数衡量生成的语音与源语音的谱图及说话人身份之间差异,最后用端到端的语音合成模型FastSpeech2进行语音克隆。为了进行实验评估,将该方法应用于公开的AISHELL3数据集进行语音转换任务。通过客观和主观评价指标对该模型进行评估,结果表明,转换后的语音在保持自然度得分的同时,在说话人相似度方面优于之前的方法。
Voice cloning is a technique for synthesizing speech that closely resembles a reference speech through algorithms such as speech analysis, speaker classification, and voice coding. To improve the transfer of individual speaker articulatory features, the MRCD model with rhythm and content de-entanglement is proposed. The rhythmic information carried by the speech signal is de-entangled by the random threshold resampling of the rhythmic random perturbation module, so that the speech rhythms are independent of each other; the content of the speaker’s similar speech features is obtained by using the Meier content enhancement module, and at the same time the stylistic and cyclic consistency loss functions are added to measure the differences between the generated speech and the spectrograms of the source speech and the speaker’s identity, and then finally the speaker is identified by an end-to-end speech synthesis model, FastSpeech2. Finally, an end-to-end speech synthesis model, FastSpeech2, is used for speech cloning. For experimental evaluation, the method was applied to the publicly available AISHELL3 dataset for the speech cloning task. The model is evaluated using objective and subjective evaluation metrics, and the results show that the converted speech outperforms the previous method in terms of speaker similarity while maintaining the naturalness score. %K 语音克隆,零样本,扬声器表示,内容增强
Voice Cloning %K Zero-Shot %K Speaker Representation %K Content Enhance %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=81978