Have a personal or library account? Click to login
Improving Speech Recognition Rate through Analysis Parameters Cover

Improving Speech Recognition Rate through Analysis Parameters

Open Access
|May 2014

References

  1. [1] Z. Jiang, H. Huang, S. Yang, S. Lu, and Z. Hao, “Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition,” in Proceedings of 5th International Conference on Natural Computation, 2009, pp. 55-59.10.1109/ICNC.2009.587
  2. [2] L. Deng, J. Wu, J. Droppo, and A. Acero, “Analysis and comparison of two speech feature extraction/compensation algorithms,” IEEE Signal Processing Letters, vol. 12, no. 6, pp. 477-480, Jun. 2005.10.1109/LSP.2005.847861
  3. [3] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. 1980.10.1109/TASSP.1980.1163420
  4. [4] J. Pelecanos, S. Slomka, and S. Sridharan, “Enhancing automatic speaker identification using phoneme clustering and frame based parameter and frame size selection,” in Proceedings of the 5th International Symposium on Signal Processing and its Applications ISSPA99 (IEEE Cat. No.99EX359), vol. 2, pp. 633-636.
  5. [5] K. Paliwal and K. Wojcicki, “Effect of Analysis Window Duration on Speech Intelligibility,” IEEE Signal Processing Letters, vol. 15, pp. 785-788, 2008.10.1109/LSP.2008.2005755
  6. [6] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, 1st ed. Prentice Hall, 1993, p. 496.
  7. [7] M. Goyani, N. Dave, and N. M. Patel, “Performance Analysis of Lip Synchronization Using LPC, MFCC and PLP Speech Parameters,” in Proceedings of International Conference on Computational Intelligence and Communication Networks, 2010, pp. 582-587.10.1109/CICN.2010.115
  8. [8] M. Suzuki, T. Yoshioka, S. Watanabe, N. Minematsu, and K. Hirose, “MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 4109-4112.10.1109/ICASSP.2012.6288822
  9. [9] D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, “Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error- Motivated Noise Suppressor,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, pp. 1061-1070, Jul. 2008.
  10. [10] T. Kinnunen, R. Saeidi, F. Sedlak, K. A. Lee, J. Sandberg, M. Hansson- Sandsten, and H. Li, “Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 1990-2001, Sep. 2012.
  11. [11] O. Gauci, C. J. Debono, and P. Micallef, “A nonlinear feature extraction method for phoneme recognition,” in Proceedings of MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference, 2008, pp. 811-815.10.1109/MELCON.2008.4618535
  12. [12] C. Lee, D. Hyun, E. Choi, J. Go, and C. Lee, “Optimizing feature extraction for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 1, pp. 80-87, Jan. 2003.10.1109/TSA.2002.805644
  13. [13] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, 1990.
  14. [14] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561-580, 1975.10.1109/PROC.1975.9792
  15. [15] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal Processing Magazine, vol. 8, no. 4, pp. 14-38, Oct. 1991.10.1109/79.91217
  16. [16] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, “Comparative study of automatic speech recognition techniques,” IET Signal Processing, vol. 7, no. 1, pp. 25-46, Feb. 2013.10.1049/iet-spr.2012.0151
  17. [17] A. Salomon, C. Y. Espy-Wilson, and O. Deshmukh, “Detection of speech landmarks: Use of temporal information,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1296-1305, 2004.
  18. [18] U. H. Yapanel and J. H. L. Hansen, “A New perspective on Feature Extraction for Robust In-Vehicle Speech Recognition,” in ISCA Proceedings: Eurospeech2003, 2003, pp. 1281-1284.10.21437/Eurospeech.2003-407
  19. [19] C. Kim and R. M. Stern, “Power function-based power distribution normalization algorithm for robust speech recognition,” in Proceedings of IEEE Workshop on Automatic Speech Recognition & Understanding, 2009, pp. 188-193.10.1109/ASRU.2009.5373233
  20. [20] S. Kim, T. Eriksson, H.-G. Kang, and D. H. Youn, “A pitch synchronous feature extraction method for speaker recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-405-8.
  21. [21] I. Ding, “Enhancement of speech recognition using a variable-length frame overlapping method,” in Proceedings of International Symposium on Computer, Communication, Control and Automation (3CA), 2010, pp. 375-377.10.1109/3CA.2010.5533800
  22. [22] Q. Zhu and A. Abeer, “On the use of variable frame rate analysis in speech recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), vol. 3, pp. 1783-1786.
  23. [23] B. Zhu and E. Micheli-Tzanakou, “Nonstationary speech analysis using neural prediction,” IEEE Engineering in Medicine and Biology Magazine, vol. 19, no. 1, pp. 102-105, 2000.10.1109/51.81625010659435
  24. [24] A. Lipeika, J. Lipeikiene, and L. Telksnys, “Development of Isolated Word Speech Recognition System,” Informatica, vol. 13, no. 1, pp. 37-46, 2002.
  25. [25] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, US. Prentice Hall, 1978, p. 962.
  26. [26] K. K. Paliwal, J. G. Lyons, and K. K. Wojcicki, “Preference for 20-40 ms window duration in speech analysis,” in Proceedings of 4th International Conference on Signal Processing and Communication Systems, 2010, pp. 1-4.10.1109/ICSPCS.2010.5709770
  27. [27] L. R. Rabiner, “On the use of autocorrelation analysis for pitch detection,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 25, no. 1, pp. 24-33, Feb. 1977.10.1109/TASSP.1977.1162905
  28. [28] W.-G. Gong, L.-P. Yang, and D. Chen, “Pitch Synchronous Based Feature Extraction for Noise-Robust Speaker Verification,” in Proceedings of Congress on Image and Signal Processing, 2008, pp. 295-298.10.1109/CISP.2008.75
  29. [29] G. L. Sarada, T. Nagarajan, and H. A. Murthy, “Multiple frame size and multiple frame rate feature extraction for speech recognition,” in Proceedings of International Conference on Signal Processing and Communications, SPCOM ’04., pp. 592-595.
  30. [30] R. D. Zilca, B. Kingsbury, J. Navratil, and G. N. Ramaswamy, “Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 467-478, Mar. 2006.10.1109/TSA.2005.857809
  31. [31] Z.-H. Tan and B. Lindberg, “Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, Oct. 2010.10.1109/JSTSP.2010.2057192
  32. [32] C.-S. Jung, M. Y. Kim, and H.-G. Kang, “Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1332-1340, Aug. 2010.
Language: English
Page range: 61 - 66
Published on: May 17, 2014
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2014 Deividas Eringis, Gintautas Tamulevičius, published by Riga Technical University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.