全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Single Channel Speech Enhancement Using Adaptive Soft-Thresholding with Bivariate EMD

DOI: 10.1155/2013/724378

Full-Text   Cite this paper   Add to My Lib

Abstract:

This paper presents a novel data adaptive thresholding approach to single channel speech enhancement. The noisy speech signal and fractional Gaussian noise (fGn) are combined to produce the complex signal. The fGn is generated using the noise variance roughly estimated from the noisy speech signal. Bivariate empirical mode decomposition (bEMD) is employed to decompose the complex signal into a finite number of complex-valued intrinsic mode functions (IMFs). The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. Each IMF is divided into short time frames for local processing. The variance of IMF of fGn calculated within a frame is used as the reference term to classify corresponding noisy speech frame into noise and signal dominant frames. Only the noise dominant frames are soft-thresholded to reduce the noise effects. Then, all the frames as well as IMFs of speech are combined, yielding the enhanced speech signal. The experimental results show the improved performance of the proposed algorithm compared to the recently reported methods. 1. Introduction The research on speech enhancement is motivated by the rapidly growing market of speech communication applications, such as teleconferencing, hands-free telephony, hearing-aids, and speech recognition. In hands-free communication systems, the microphone(s) is typically placed at a certain distance from the speaker. In adverse acoustic environment, various noise sources make the speech signal corrupted. Although, the human auditory system is remarkably robust in most adverse situations, noise effects heavily affect the performance of automatic speech recognition (ASR) systems. The performance of an ASR system trained in one specific environment will drop considerably when used in another acoustic environment [1]. Several approaches have already been proposed to improve the speech enhancement results. Although the microphone array based approach exhibits better results, at the same time speech processing research community is trying to reduce the number of microphones (channels). The spectral subtraction is one of the early methods to reduce the noise effects from the observed speech signals. In this method, the noise reduction is achieved by appropriate adjustment of the set of spectral magnitudes [2]. Its basic requirement is the noise spectrum which is determined from the nonspeech segments [3]. In such single channel speech enhancement system, the residual noise is a usual issue. It decreases the speech intelligibility and hence further processing is

References

[1]  S. F. Beaufays and V. Digalakis, “Training data clustering for improved speech recognition,” in Proceedings of the EUROSPEECH, Madrid, Spain, 1995.
[2]  S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.
[3]  Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
[4]  E. Deger, M. K. I. Molla, K. Hirose, N. Minematsu, and M. K. Hasan, “Speech enhancement using soft thresholding with DCT-EMD based hybrid algorithm,” in Proceedings of the EUSIPCO, Septembre 2007.
[5]  M. E. Hamid, S. Das, K. Hirose, and M. K. I. Molla, “Speech enhancement using EMD based adaptive soft-thresholding (EMD-ADT),” International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 5, no. 2, pp. 1–16, 2012.
[6]  N. E. Huang, Z. Shen, S. Long et al., “The empirical mode decomposition and Hilbert spectrum for non-linear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, pp. 903–995, 1998.
[7]  G. Rilling, P. Flandrin, P. Goncalves, and J. M. Lilly, “Bivariate empirical mode decomposition,” IEEE Signal Processing Letters, vol. 14, no. 12, pp. 936–939, 2007.
[8]  T. Tanaka and D. P. Mandic, “Complex empirical mode decomposition,” IEEE Signal Processing Letters, vol. 14, no. 2, pp. 101–104, 2007.
[9]  Z. Wu and N. E. Huang, “A study of the characteristics of white noise using the empirical mode decomposition method,” Proceedings of the Royal Society A, vol. 460, no. 2046, pp. 1597–1611, 2004.
[10]  P. Flandrin, G. Rilling, and P. Gon?alvés, “Empirical mode decomposition as a filter bank,” IEEE Signal Processing Letters, vol. 11, no. 2, pp. 112–114, 2004.
[11]  M. K. I. Molla, T. Tanaka, T. M. Rutkowski, and A. Cichocki, “Separation of EOG artifacts from eeg signals using bivariate EMD,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '10), pp. 562–565, March 2010.
[12]  M. K. I. Molla, P. R. Ghosh, and K. Hirose, “Bivariate EMD-based data adaptive approach to the analysis of climate variability,” Discrete Dynamics in Nature and Society, vol. 2011, Article ID 935034, 21 pages, 2011.
[13]  D. Looney and D. P. Mandic, “Multiscale image fusion using complex extensions of EMD,” IEEE Transactions on Signal Processing, vol. 57, no. 4, pp. 1626–1630, 2009.
[14]  M. B. Luca, S. Azou, G. Burel, and A. Serbanescu, “On exact Kalman filtering of polynomial systems,” IEEE Transactions on Circuits and Systems I, vol. 53, no. 6, pp. 1329–1340, 2006.
[15]  A. Kagan and L. A. Shepp, “Why the variance?” Statistics and Probability Letters, vol. 38, no. 4, pp. 329–333, 1998.
[16]  C. Park, D. Looney, P. Kidmose, M. Ungstrup, and D. P. Mandic, “Time-frequency analysis of EEG asymmetry using bivariate empirical mode decomposition,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19, no. 4, pp. 366–373, 2011.
[17]  S. Salahuddin, S. Z. Al Islam, M. K. Hasan, and M. R. Khan, “Soft thresholding for DCT speech enhancement,” Electronics Letters, vol. 38, no. 24, pp. 1605–1607, 2002.
[18]  A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ): a new method for speech quality assessment of telephone networks and codecs,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 749–752, May 2001.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133