全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于喉部振动的语音自动识别系统的设计
Design of Automatic Speech Recognition System Based on Throat Vibration

DOI: 10.12677/MOS.2024.131035, PP. 365-376

Keywords: 喉部软骨振动,深度学习,语音识别,卷积神经网络
Laryngeal Cartilage Vibration
, Deep Learning, Speech Recognition, Convolutional Neural Network

Full-Text   Cite this paper   Add to My Lib

Abstract:

现有且成熟的语音识别系统基本局限于健康群体及主流语言,并不适用于声带受损的患者。因此,本文研究设计了一款基于喉部振动的语音自动识别系统,旨在为声带受损患者及言语障碍残疾群体的康复训练与正常生活提供一种可行的方案。采用智能数字听诊器Mintti Smartho-D2对喉部软骨振动信号进行检测,借助于主流的语音识别深度学习算法:卷积神经网络模型、卷积长短时记忆神经网络模型、卷积递归神经网络模型,对喉振信号数据集分别进行多次训练,以期实现喉部软骨振动信号到正常语音信号的转换。通过对比实验,得出三种模型的测试字错率分别为0.1572、0.2018、0.06787,其中识别效果最佳为卷积递归神经网络模型,实现了字错率在安静环境下低于0.07的效果。本文可初步验证该设计的可行性及CRNN模型能够在效率和识别效果上取得较好的性能。
Existing and mature speech recognition systems are basically limited to healthy groups and main-stream languages, and are not applicable to patients with impaired vocal cords. Therefore, in this paper, an automatic speech recognition system based on laryngeal vibration is designed with the aim of providing a feasible solution for the rehabilitation training and normal life of patients with impaired vocal folds and speech-impaired disabled groups. Intelligent digital stethoscope Mintti Smartho-D2 was used to detect the vibration signal of laryngeal cartilage with the help of main-stream speech recognition deep learning algorithms: The convolutional neural network model, convolutional short-duration memory neural network model and convolutional recurrent neural network model were trained several times on laryngeal vibration signal data set respectively, in order to realize the conversion of laryngeal cartilage vibration signal to normal speech signal. Through comparison experiments, it is concluded that the test word error rates of the three models are 0.1572, 0.2018, and 0.06787, respectively, among which the best recognition effect is the con-volutional recurrent neural network model, which realizes a word error rate of less than 0.07 in a quiet environment. This paper can initially verify the feasibility of the design and the CRNN model can achieve better performance in terms of efficiency and recognition effect.

References

[1]  福州弋元信息技术有限公司. 一种帮助听障人和非听障人交流的智能交互设备商业计划书[EB/OL].
https://www.docin.com/p-2248375934.html, 2023-03-23.
[2]  Lee, W., Seong, J.J., Ozlu, B., Shim, B.S., Marakhimov, A. and Lee, S. (2021) Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review. Sensors, 21, Article No. 1399.
https://doi.org/10.3390/s21041399
[3]  李帅, 吴玉蓉. 面向聋哑人群的无障碍交流辅助系统设计研究[J]. 物联网技术, 2022, 12(11): 113-116.
https://doi.org/10.16667/j.issn.2095-1302.2022.11.034
[4]  冯成龙, 刘桢. 人工智能技术在聋哑人沟通交流方面的应用[J]. 智库时代, 2021(7): 256-257.
[5]  Joshi, A., Sierra, H. and Arzuaga, E. (2017) American Sign Language Translation Using Edge Detection and Cross Correlation. 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena, 16-18 August 2017.
https://doi.org/10.1109/ColComCon.2017.8088212
[6]  Sun, K., et al. (2018) Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, Berlin, 14 October 2018, 581-593.
https://doi.org/10.1145/3242587.3242599
[7]  Ahmed, M.A., Zaidan, B.B., Zaidan, A.A., Salih, M.M. and Lakulu, M.M.B. (2018) A Review on Systems-Based Sensory Gloves for Sign Language Recognition State of the Art between 2007 and 2017. Sensors, 18, Article No. 2208.
https://doi.org/10.3390/s18072208
[8]  Nishimura, T. (2020) Primate Vocal Anatomy and Physiology: Similarities and Differences between Humans and Nonhuman Primates. In: Masataka, N., Ed., The Origins of Language Revisited, Springer, Singapore.
https://doi.org/10.1007/978-981-15-4250-3_2
[9]  Ghai, W. and Singh, N. (2012) Literature Review on Automatic Speech Recognition. International Journal of Computer Applications, 41, 42-50.
https://doi.org/10.5120/5565-7646
[10]  Lu, X., Li, S. and Fujimoto, M. (2020) Automatic Speech Recognition. In: Kida-wara, Y., Sumita, E., Kawai, H., Eds., Speech-to-Speech Translation. SpringerBriefs in Computer Science, Springer, Singapore.
https://doi.org/10.1007/978-981-15-0595-9_2
[11]  Electronic User Guide Stethoscope.
https://minttihealth.com/wp-content/uploads/2022/06/Digital-Stethoscope-User-Manual.pdf
[12]  谭磊, 余欣洋, 罗伟洋, 等. 基于深度学习的移动端语音识别系统设计[J]. 单片机与嵌入式系统应用, 2020, 20(9): 28-31+35.
[13]  Lee, S.J. and Kwon, H.Y. (2020) A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection. Applied Sci-ences, 10, Article No. 7385.
https://doi.org/10.3390/app10207385
[14]  Labied, M., Belangour, A., Banane, M., et al. (2022) An Overview of Automatic Speech Recognition Preprocessing Techniques. 2022 International Conference on Decision Aid Sciences and Applications (DASA), Chiangrai, 23-25 March 2022, 804-809.
https://doi.org/10.1109/DASA54658.2022.9765043
[15]  Zhang, T., Shao, Y., Wu, Y., et al. (2020) An Overview of Speech Endpoint Detection Algorithms. Applied Acoustics, 160, Article 107133.
https://doi.org/10.1016/j.apacoust.2019.107133
[16]  Caterini, A.L., Chang, D.E., Caterini, A.L., et al. (2018) Recurrent Neural Networks. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science, Springer, Cham. 59-79.
https://doi.org/10.1007/978-3-319-75304-1_5
[17]  王毅, 谢娟, 成颖. 结合LSTM和CNN混合架构的深度神经网络语言模型[J]. 情报学报, 2018, 37(2): 194-205.
[18]  陈戈, 谢旭康, 孙俊, 等. 使用Conformer增强的混合CTC/Attention端到端中文语音识别[J]. 计算机工程与应用, 2023, 59(4): 97-103.
[19]  Seki, H., Hori, T., Watanabe, S., et al. (2019) Vec-torized Beam Search for CTC-Attention-Based Speech Recognition. 20th Annual Conference of the International Speech Com-munication Association: Crossroads of Speech and Language, INTERSPEECH 2019, Graz, 15-19 September 2019, 3825-3829.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413