全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Frame Length Dependency for Fundamental Frequency Extraction in Noisy Speech

DOI: 10.4236/jsip.2024.151001, PP. 1-17

Keywords: Pitch Estimation, Fundamental Frequency, BaNa, ACF, Frame Length

Full-Text   Cite this paper   Add to My Lib

Abstract:

The fundamental frequency plays a significant part in understanding and perceiving the pitch of a sound. The pitch is a fundamental attribute employed in numerous speech-related works. For fundamental frequency extraction, several algorithms have been developed which one to use relies on the signal’s characteristics and the surrounding noise. Thus, the algorithm’s noise resistance becomes more critical than ever for precise fundamental frequency estimation. Nonetheless, numerous state-of-the-art algorithms face struggles in achieving satisfying outcomes when confronted with speech recordings that are noisy with low signal-to-noise ratio (SNR) values. Also, most of the recent techniques utilize different frame lengths for pitch extraction. From this point of view, This research considers different frame lengths on male and female speech signals for fundamental frequency extraction. Also, analyze the frame length dependency on the speech signal analytically to understand which frame length is more suitable and effective for male and female speech signals specifically. For the validation of our idea, we have utilized the conventional autocorrelation function (ACF), and state-of-the-art method BaNa. This study puts out a potent idea that will work better for speech processing applications in noisy speech. From experimental results, the proposed idea represents which frame length is more appropriate for male and female speech signals in noisy environments.

References

[1]  Vary, P. and Martin, R. (2006) Digital Speech Transmission: Enhancement, Coding and Error Concealment. John Wiley & Sons, New York.
https://doi.org/10.1002/0470031743
[2]  Shahnaz, C. (2002) Pitch Extraction of Noisy Speech Using Dominant Frequency of the Harmonic Speech Model. Department of Electrical and Electronic Engineering.
[3]  Ling, Z.H., Wang, Z.G. and Dai, L.R. (2010) Statistical Modeling of Syllable-Level F0 Features for HMM-Based Unit Selection Speech Synthesis. 7th International Symposium on Chinese Spoken Language Processing, Tainan, 29 November-3 December 2010, 144-147.
https://doi.org/10.1109/ISCSLP.2010.5684833
[4]  Sakai, S. and Glass, J. (2003) Fundamental Frequency Modeling for Corpus-Based Speech Synthesis Based on a Statistical Learning Technique. 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, St Thomas, 30 November-4 December 2003, 712-717.
https://doi.org/10.1109/ASRU.2003.1318527
[5]  Buera, L., Droppo, J. and Acero, A. (2008) Speech Enhancement Using a Pitch Predictive Model. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, 31 March-4 April 2008, 4885-4888.
https://doi.org/10.1109/ICASSP.2008.4518752
[6]  Ananthakrishnan, S. and Narayanan, S. (2007) Improved Speech Recognition Using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, 15-20 April 2007, IV-873-IV-876.
https://doi.org/10.1109/ICASSP.2007.367209
[7]  Sinha, R. and Shahnawazuddin, S. (2018) Assessment of Pitch-Adaptive Front-End Signal Processing for Children’s Speech Recognition. Computer Speech & Language, 48, 103-121.
https://doi.org/10.1016/j.csl.2017.10.007
[8]  Wang, C. (2001) Prosodic Modeling for Improved Speech Recognition and Understanding. Master’s Thesis, Massachusetts Institute of Technology, Cambridge.
[9]  Furui, S. (1986) Research of Individuality Features in Speech Waves and Automatic Speaker Recognition Techniques. Speech Communication, 5, 183-197.
https://doi.org/10.1016/0167-6393(86)90007-5
[10]  Kwon, O.W., Chan, K., Hao, J.C. and Lee, T.W. (2003) Emotion Recognition by Speech Signals. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, 1-4 September 2003, 125-128.
https://doi.org/10.21437/Eurospeech.2003-80
[11]  Park, H., Yoon, J.Y., Kim, J.H. and Oh, E. (2001) Improving Perceptual Quality of Speech in a Noisy Environment by Enhancing Temporal Envelope and Pitch. IEEE Signal Processing Letters, 17, 489-492.
https://doi.org/10.1109/LSP.2010.2044937
[12]  Sukhostat, L. and Imamverdiyev, Y. (2015) Partial Regularity of Suitable Weak Solutions of the Navier-Stokes Equations. Journal of Voice, 29, 410-417.
https://doi.org/10.1016/j.jvoice.2014.09.016
[13]  Cardozo, B. and Ritsma, R. (1968) On the Perception of Imperfect Periodicity. IEEE Transactions on Audio and Electroacoustics, 16, 159-164.
https://doi.org/10.1109/TAU.1968.1161978
[14]  Rabiner, L., Cheng, M., Rosenberg, A. and McGonegal, C. (1976) A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 399-418.
https://doi.org/10.1109/TASSP.1976.1162846
[15]  Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T. and Banno, H. (2008) Tandem-STRAIGHT: A Temporally Stable Power Spectral Representation for Periodic Signals and Applications to Interference-Free Spectrum, F0, and Aperiodicity Estimation. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, 31 March-4 April 2008, 3933-3936.
https://doi.org/10.1109/ICASSP.2008.4518514
[16]  Rabiner, L. (1977) On the Use of Autocorrelation Analysis for Pitch Detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25, 23-33.
https://doi.org/10.1109/TASSP.1977.1162905
[17]  Ross, M., Shaffer, H., Cohen, A., Freudberg, R. and Manley, H. (1974) Average Magnitude Difference Function Pitch Extractor. IEEE Transactions on Acoustics, Speech, and Signal Processing, 22, 353-362.
https://doi.org/10.1109/TASSP.1974.1162598
[18]  Chakraborty, R., Sengupta, D. and Sinha, S. (2009) Pitch Tracking of Acoustic Signals Based on Average Squared Mean Difference Function. Signal, Image and Video Processing, 3, 319-387.
https://doi.org/10.1007/s11760-008-0072-5
[19]  Shimamura, T. and Kobayashi, H. (2001) Weighted Autocorrelation for Pitch Extraction of Noisy Speech. IEEE Transactions on Speech and Audio Processing, 9, 727-730.
https://doi.org/10.1109/89.952490
[20]  Boersma, P., et al. (1993) Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound. IFA Proceedings, 17, 97-110.
[21]  De Cheveign, A. and Kawahara, H. (2002) YIN, a Fundamental Frequency Estimator for Speech and Music. The Journal of the Acoustical Society of America, 111, 1917-1930.
https://doi.org/10.1121/1.1458024
[22]  Noll, A.M. (1967) Cepstrum Pitch Determination. The Journal of the Acoustical Society of America, 41, 293-309.
https://doi.org/10.1121/1.1910339
[23]  Kobayashi, H. and Shimamura, T. (1998) A Modified Cepstrum Method for Pitch Extraction. IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98 EX242), Chiang Mai, 24-27 November 1998, 293-302.
[24]  Rashidul Hasan, M.A.F.M., Rahman, M.S. and Shimamura, T. (2012) Windowless-Autocorrelation-Based Cepstrum Method for Pitch Extraction of Noisy Speech. Journal of Signal Processing, 16, 231-239.
https://doi.org/10.2299/jsp.16.231
[25]  Gonzalez, S. and Brookes, M. (2014) PEFAC—A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 518-530.
https://doi.org/10.1109/TASLP.2013.2295918
[26]  Hermes, D.J. (1988) Measurement of Pitch by Subharmonic Summation. The Journal of the Acoustical Society of America, 83, 257-264.
https://doi.org/10.1121/1.396427
[27]  Li, B. and Zhang, X.W. (2023) A Pitch Estimation Algorithm for Speech in Complex Noise Environments Based on the Radon Transform. IEEE Access, 11, 9876-9889.
https://doi.org/10.1109/ACCESS.2023.3240181
[28]  Mnasri, Z., Rovetta, S. and Masulli, F. (2022) A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech. Circuits, Systems, and Signal Processing, 41, 6226-6294.
https://doi.org/10.1007/s00034-022-02082-8
[29]  Huang, F. and Lee, T. (2012) Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique. IEEE Transactions on Audio, Speech, and Language Processing, 11, 99-109.
https://doi.org/10.1109/TASL.2012.2215589
[30]  Chu, W. and Alwan, A. (2011) SAFE: A Statistical Approach to F0 Estimation under Clean and Noisy Conditions. IEEE Transactions on Audio, Speech, and Language Processing, 20, 993-944.
https://doi.org/10.1109/TASL.2011.2168518
[31]  Gfeller, B., Frank, C., Roblek, D., Sharifi, M., Tagliasacchi, M. and Velimirovic, M. (2020) SPICE: Self-Supervised Pitch Estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1118-1128.
https://doi.org/10.1109/TASLP.2020.2982285
[32]  Singh, S., Wang, R. and Qiu, Y. (2021) DeepF0: End-to-End Fundamental Frequency Estimation for Music and Speech Signals. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 6-11 June 2021, 61-65.
https://doi.org/10.1109/ICASSP39728.2021.9414050
[33]  Wei, W., Li, P., Yu, Y. and Li, W. (2022) HarmoF0: Logarithmic Scale Dilated Convolution for Pitch Estimation. IEEE International Conference on Multimedia and Expo, Taipei City, 18-22 July 2022, 1-6.
https://doi.org/10.1109/ICME52920.2022.9858935
[34]  Yang, N., Ba, H., Cai, W., Demirkol, I. and Heinzelman, W. (2014) BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 1833-1848.
https://doi.org/10.1109/TASLP.2014.2352453
[35]  Plante, F., Meyer, G. and Ainsworth, W. (1995) A Pitch Extraction Reference Database. 4th European Conference on Speech Communication and Technology, Madrid, 18-21 September 1995, 837-840.
https://doi.org/10.21437/Eurospeech.1995-191
[36]  Varga, A. and Steeneken, H.J.M. (1993) Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Journal of Computer and Communications, 12, 247-251.
https://doi.org/10.1016/0167-6393(93)90095-3
[37]  WCNG, Wireless Communication Networking Group.
https://hajim.rochester.edu/ece/sites/wcng/code.html

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413