This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.
Published in | Science Journal of Circuits, Systems and Signal Processing (Volume 6, Issue 2) |
DOI | 10.11648/j.cssp.20170602.12 |
Page(s) | 11-17 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Voiced/Unvoiced Classification, Spectrogram Image, Short-time Energy Ratio, Energy Ratio Pattern, Short-time Zero-crossing Rate, White Gaussian Noise
[1] | Jong Kwan Lee, Chang D. Yoo, “Wavelet speech enhancement based on voiced/unvoiced decision”, Korea Advanced Institute of Science and Technology The 32nd International Congress and Exposition on Noise Control Engineering, Jeju International Convention Center, Seogwipo, Korea, August 25-28, 2003. |
[2] | B. Atal, and L. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Trans. On ASSP, vol. ASSP-24, pp. 201-212, 1976. |
[3] | S. Ahmadi, and A. S. Spanias, “Cepstrum-Based Pitch Detection using a New Statistical V/UV Classification Algorithm,” IEEE Trans. Speech Audio Processing, vol. 7 No. 3, pp. 333-338, 1999. |
[4] | Y. Qi, and B. R. Hunt, “Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier,” IEEE Trans. Speech Audio Processing, vol. 1 No. 2, pp. 250-255, 1993. |
[5] | L. Siegel, “A Procedure for using Pattern Classification Techniques to obtain a Voiced/Unvoiced Classifier”, IEEE Trans. on ASSP, vol. ASSP-27, pp. 83- 88, 1979. |
[6] | T. L. Burrows, “Speech Processing with Linear and Neural Network Models”, Ph.D. thesis, Cambridge University Engineering Department, U.K., 1996. |
[7] | D. G. Childers, M. Hahn, and J. N. Larar, “Silent and Voiced/Unvoiced/Mixed Excitation (Four-Way) Classification of Speech,” IEEE Trans. on ASSP, vol. 37 No. 11, pp. 1771-1774, 1989. |
[8] | Jashmin K. Shah, Ananth N. Iyer, Brett Y. Smolenski, and Robert E. Yantorno “Robust voiced/unvoiced classification using novel features and Gaussian Mixture model”, Speech Processing Lab., ECE Dept., Temple University, 1947 N 12th St., Philadelphia, PA 19122-6077, USA. |
[9] | Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251. |
[10] | Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13: 9780132136037, 1978. |
[11] | Karen Kafadar,” Gaussian white-noise generation for digital signal synthesis” IEEE Transactions on Instrumentation and Measurement, Volume: IM-35, Issue: 4, Dec. 1986 DOI: 10.1109/TIM.1986.6499122 |
[12] | Titze, I. R. “Principles of Voice Production”, Prentice Hall (currently published by NCVS.org) (pp. 188), 1994, ISBN 978-0-13-717893-3. |
[13] | Baken, R. J. “Clinical Measurement of Speech and Voice”. London: Taylor and Francis Ltd. (pp. 177), 1987, ISBN 1-5659-3869-0. |
[14] | Alkulaibi, A., Soraghan, J. J., and Durrani, T. S., “Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals”, in the proceedings of 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, pp. 194-197, 1996. |
[15] | Lobo, and Loizou, P., "Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition”, in the Proceedings of ICASSP, pp. 820-823, 2003. |
APA Style
Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. (2017). A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Science Journal of Circuits, Systems and Signal Processing, 6(2), 11-17. https://doi.org/10.11648/j.cssp.20170602.12
ACS Style
Kazi Mahmudul Hassan; Ekramul Hamid; Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci. J. Circuits Syst. Signal Process. 2017, 6(2), 11-17. doi: 10.11648/j.cssp.20170602.12
AMA Style
Kazi Mahmudul Hassan, Ekramul Hamid, Khademul Islam Molla. A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image. Sci J Circuits Syst Signal Process. 2017;6(2):11-17. doi: 10.11648/j.cssp.20170602.12
@article{10.11648/j.cssp.20170602.12, author = {Kazi Mahmudul Hassan and Ekramul Hamid and Khademul Islam Molla}, title = {A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image}, journal = {Science Journal of Circuits, Systems and Signal Processing}, volume = {6}, number = {2}, pages = {11-17}, doi = {10.11648/j.cssp.20170602.12}, url = {https://doi.org/10.11648/j.cssp.20170602.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cssp.20170602.12}, abstract = {This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature.}, year = {2017} }
TY - JOUR T1 - A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image AU - Kazi Mahmudul Hassan AU - Ekramul Hamid AU - Khademul Islam Molla Y1 - 2017/10/23 PY - 2017 N1 - https://doi.org/10.11648/j.cssp.20170602.12 DO - 10.11648/j.cssp.20170602.12 T2 - Science Journal of Circuits, Systems and Signal Processing JF - Science Journal of Circuits, Systems and Signal Processing JO - Science Journal of Circuits, Systems and Signal Processing SP - 11 EP - 17 PB - Science Publishing Group SN - 2326-9073 UR - https://doi.org/10.11648/j.cssp.20170602.12 AB - This paper presents a voiced/unvoiced classification algorithm of the noisy speech signal by analyzing two acoustic features of the speech signal. Short-time energy and short-time zero- crossing rates are one of the most distinguishable time domain features of a speech signal to classify its voiced activity into voiced/unvoiced segment. A new idea is developed where frame by frame processing has done in narrow band speech signal using spectrogram image. Two time domain features, short-time energy (STE) and short-time zero-crossing rate (ZCR) are used to classify its voiced/unvoiced parts. In the first stage, each frame of the analyzing spectrogram is divided into three separate sub bands and examines their short-time energy ratio pattern. Then an energy ratio pattern matching look up table is used to classify the voicing activity. However, this method successfully classifies patterns 1 through 4 but fails in the rest of the patterns in the look up table. Therefore, the rest of the patterns are confirmed in the second stage where frame wise short-time average zero- crossing rate is compared with a threshold value. In this study, the threshold value is calculated from the short-time average zero-crossing rate of White Gaussian Noise (wGn). The accuracy of the proposed method is evaluated using both male and female speech waveforms under different signal-to-noise ratios (SNRs). Experimental results show that the proposed method achieves better accuracy than the conventional methods in the literature. VL - 6 IS - 2 ER -