Improving Speech Recognition Rate through Analysis Parameters

Deividas Eringis; Gintautas Tamulevičius

doi:10.2478/ecce-2014-0009

References

[1] Z. Jiang, H. Huang, S. Yang, S. Lu, and Z. Hao, “Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition,” in Proceedings of 5th International Conference on Natural Computation, 2009, pp. 55-59.10.1109/ICNC.2009.587
Search in Google Scholar
[2] L. Deng, J. Wu, J. Droppo, and A. Acero, “Analysis and comparison of two speech feature extraction/compensation algorithms,” IEEE Signal Processing Letters, vol. 12, no. 6, pp. 477-480, Jun. 2005.10.1109/LSP.2005.847861
Search in Google Scholar
[3] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. 1980.10.1109/TASSP.1980.1163420
Search in Google Scholar
[4] J. Pelecanos, S. Slomka, and S. Sridharan, “Enhancing automatic speaker identification using phoneme clustering and frame based parameter and frame size selection,” in Proceedings of the 5th International Symposium on Signal Processing and its Applications ISSPA99 (IEEE Cat. No.99EX359), vol. 2, pp. 633-636.
Search in Google Scholar
[5] K. Paliwal and K. Wojcicki, “Effect of Analysis Window Duration on Speech Intelligibility,” IEEE Signal Processing Letters, vol. 15, pp. 785-788, 2008.10.1109/LSP.2008.2005755
Search in Google Scholar
[6] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, 1st ed. Prentice Hall, 1993, p. 496.
Search in Google Scholar
[7] M. Goyani, N. Dave, and N. M. Patel, “Performance Analysis of Lip Synchronization Using LPC, MFCC and PLP Speech Parameters,” in Proceedings of International Conference on Computational Intelligence and Communication Networks, 2010, pp. 582-587.10.1109/CICN.2010.115
Search in Google Scholar
[8] M. Suzuki, T. Yoshioka, S. Watanabe, N. Minematsu, and K. Hirose, “MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 4109-4112.10.1109/ICASSP.2012.6288822
Search in Google Scholar
[9] D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, “Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error- Motivated Noise Suppressor,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, pp. 1061-1070, Jul. 2008.
Search in Google Scholar
[10] T. Kinnunen, R. Saeidi, F. Sedlak, K. A. Lee, J. Sandberg, M. Hansson- Sandsten, and H. Li, “Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 1990-2001, Sep. 2012.
Search in Google Scholar
[11] O. Gauci, C. J. Debono, and P. Micallef, “A nonlinear feature extraction method for phoneme recognition,” in Proceedings of MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference, 2008, pp. 811-815.10.1109/MELCON.2008.4618535
Search in Google Scholar
[12] C. Lee, D. Hyun, E. Choi, J. Go, and C. Lee, “Optimizing feature extraction for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 1, pp. 80-87, Jan. 2003.10.1109/TSA.2002.805644
Search in Google Scholar
[13] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, 1990.
Search in Google Scholar
[14] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561-580, 1975.10.1109/PROC.1975.9792
Search in Google Scholar
[15] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal Processing Magazine, vol. 8, no. 4, pp. 14-38, Oct. 1991.10.1109/79.91217
Search in Google Scholar
[16] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, “Comparative study of automatic speech recognition techniques,” IET Signal Processing, vol. 7, no. 1, pp. 25-46, Feb. 2013.10.1049/iet-spr.2012.0151
Search in Google Scholar
[17] A. Salomon, C. Y. Espy-Wilson, and O. Deshmukh, “Detection of speech landmarks: Use of temporal information,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1296-1305, 2004.
Search in Google Scholar
[18] U. H. Yapanel and J. H. L. Hansen, “A New perspective on Feature Extraction for Robust In-Vehicle Speech Recognition,” in ISCA Proceedings: Eurospeech2003, 2003, pp. 1281-1284.10.21437/Eurospeech.2003-407
Search in Google Scholar
[19] C. Kim and R. M. Stern, “Power function-based power distribution normalization algorithm for robust speech recognition,” in Proceedings of IEEE Workshop on Automatic Speech Recognition & Understanding, 2009, pp. 188-193.10.1109/ASRU.2009.5373233
Search in Google Scholar
[20] S. Kim, T. Eriksson, H.-G. Kang, and D. H. Youn, “A pitch synchronous feature extraction method for speaker recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-405-8.
Search in Google Scholar
[21] I. Ding, “Enhancement of speech recognition using a variable-length frame overlapping method,” in Proceedings of International Symposium on Computer, Communication, Control and Automation (3CA), 2010, pp. 375-377.10.1109/3CA.2010.5533800
Search in Google Scholar
[22] Q. Zhu and A. Abeer, “On the use of variable frame rate analysis in speech recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), vol. 3, pp. 1783-1786.
Search in Google Scholar
[23] B. Zhu and E. Micheli-Tzanakou, “Nonstationary speech analysis using neural prediction,” IEEE Engineering in Medicine and Biology Magazine, vol. 19, no. 1, pp. 102-105, 2000.10.1109/51.81625010659435
Search in Google Scholar
[24] A. Lipeika, J. Lipeikiene, and L. Telksnys, “Development of Isolated Word Speech Recognition System,” Informatica, vol. 13, no. 1, pp. 37-46, 2002.
Search in Google Scholar
[25] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, US. Prentice Hall, 1978, p. 962.
Search in Google Scholar
[26] K. K. Paliwal, J. G. Lyons, and K. K. Wojcicki, “Preference for 20-40 ms window duration in speech analysis,” in Proceedings of 4th International Conference on Signal Processing and Communication Systems, 2010, pp. 1-4.10.1109/ICSPCS.2010.5709770
Search in Google Scholar
[27] L. R. Rabiner, “On the use of autocorrelation analysis for pitch detection,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 25, no. 1, pp. 24-33, Feb. 1977.10.1109/TASSP.1977.1162905
Search in Google Scholar
[28] W.-G. Gong, L.-P. Yang, and D. Chen, “Pitch Synchronous Based Feature Extraction for Noise-Robust Speaker Verification,” in Proceedings of Congress on Image and Signal Processing, 2008, pp. 295-298.10.1109/CISP.2008.75
Search in Google Scholar
[29] G. L. Sarada, T. Nagarajan, and H. A. Murthy, “Multiple frame size and multiple frame rate feature extraction for speech recognition,” in Proceedings of International Conference on Signal Processing and Communications, SPCOM ’04., pp. 592-595.
Search in Google Scholar
[30] R. D. Zilca, B. Kingsbury, J. Navratil, and G. N. Ramaswamy, “Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 467-478, Mar. 2006.10.1109/TSA.2005.857809
Search in Google Scholar
[31] Z.-H. Tan and B. Lindberg, “Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, Oct. 2010.10.1109/JSTSP.2010.2057192
Search in Google Scholar
[32] C.-S. Jung, M. Y. Kim, and H.-G. Kang, “Selecting Feature Frames for Automatic Speaker Recognition Using Mutual Information,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1332-1340, Aug. 2010.
Search in Google Scholar

Improving Speech Recognition Rate through Analysis Parameters

References

Paradigm

My account