Pipelined language model construction for Polish speech recognition

Jerzy Sas; Andrzej Żołnierek

doi:10.2478/amcs-2013-0049

.blurhash-client-img { display: none !important; }

Pipelined language model construction for Polish speech recognition

International Journal of Applied Mathematics and Computer Science

Volume 23 (2013): Issue 3 (September 2013)

By: Jerzy Sas and Andrzej Żołnierek

Open Access

|Sep 2013

Abstract

The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

References

Brown, P., de Souza, P.V., Mercer, R.L., Pietra, V.J.D. and Lai, J.C. (1992). Class-based n-gram models of natural language, Computational Linguistics 18(1): 467-479.
Search in Google Scholar
Brychcin, T. and Konopik, M. (2011). Morphological based language models for inflectional languages, Proceedings ofthe 6th IEEE International Conference on Intelligent DataAcquisition and Advanced Computing Systems, Praque,Czech Republic, pp. 560-563.
Search in Google Scholar
Chen, S. and Goodman, S. (1999). An empirical study of smoothing techniques for language modeling, ComputerSpeech and Language 1(13): 359-394.10.1006/csla.1999.0128
Search in Google Scholar
Chen, Y. and Chan, K. (2003). Extended multi-word trigger pair language model using data mining technique, Systems,Man and Cybernetics 1(1): 262-267.
Search in Google Scholar
Devine, E., Gaehde, S. and Curtis, A. (2007). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports, Journal ofAmerican Medical Informatics Association 1(7): 462-468.10.1136/jamia.2000.0070462
Search in Google Scholar
Gale, A. and Sampson, G. (1995). Good-Turing frequency estimation without tears, Journal of Quantitative Linguistics2(1): 217-239.10.1080/09296179508590051
Search in Google Scholar
Goodman, J. (2001). A bit of progress in language modeling extended version, Technical Report MSR-TR-2001-72, Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, WA.
Search in Google Scholar
Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependence in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and AudioProcessing 7(1): 30-39.10.1109/89.736328
Search in Google Scholar
Jelinek, F., Merialdo, B., Roukos, S. and Strauss, M. (2001). A dynamic language model for speech recognition, Proceedingsof the Workshop on Speech and Natural Language,HLT’91, Pacific Grove, CA, USA, pp. 293-295.
Search in Google Scholar
Jurafsky, D. and Matrin, J. (2009). Speech and Language Processing. An Introduction to Natural Language Processing,Computational Linguistics and Speech Recognition, Pearson Prentice Hall, Englewood Cliffs, NJ.
Search in Google Scholar
Kasprzak, W., Wilkowski, A. and Czapnik, K. (2012). Hand gesture recognition based on free-form contours and probabilistic inference, International Journal of AppliedMathematics and Computer Science 22(2): 437-448, DOI: 10.2478/v10006-012-0033-6.10.2478/v10006-012-0033-6
Search in Google Scholar
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing35(3): 400-401.10.1109/TASSP.1987.1165125
Search in Google Scholar
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation, MIT Summit 2005, Phuket, Thailand, pp. 79-86.
Search in Google Scholar
Kolorenc, J., Nouza, J. and Cerva, P. (2006). Multi-words in the Czech TV and radio news transcription system, Proceedingsof SPECOM 2006, St. Petersburg, Russia, pp. 70-74.
Search in Google Scholar
Lee, A., Kawahara, T. and Shikano, K. (2001). Julius-an open source real-time large vocabulary recognition engine, Proceedingsof the European Conference on Speech Communicationand Technology (EUROSPEECH), Aalborg, Denmark, pp. 1691-1694.
Search in Google Scholar
Mauces, M., Rotownik, T. and Zemljak, M. (2003). Modelling highly inflected Slovenian language, International Journalof Speech Technology 1(6): 254-257.
Search in Google Scholar
Mikolov, T., Deoras, A., Kombrink, S., Burget, L. and Cernocky, J. (2011). Empirical evaluation and combination of advanced language modeling techniques, INTERSPEECH,ISCA, Florence, Italy, pp. 605-608.
Search in Google Scholar
Niesler, T., Whittaker, E.W.D. and Woodland, P. (1998). Comparison of part-of-speech and automatically derived category-based language models for speech recognition, Proceedings of ICASSP 98, Seattle, WA, USA, pp. 177-180.
Search in Google Scholar
Piasecki, M. (2007). Polish tagger TaKIPI: Rule based construction and optimisation, Task Quarterly11(1): 151-167.
Search in Google Scholar
Piasecki, M. and Broda, B. (2007). Correction of medical handwriting OCR based on semantic similarity, in H. Yin, P. Tino, E. Corchado, W. Byrne and X. Yao (Eds.), IntelligentData Engineering and Automated Learning-IDEAL2007, Lecture Notes in Computer Science, Vol. 4881, Springer Verlag, Heidelberg, pp. 437-446.
Search in Google Scholar
Piasecki, M. and Radziszewski, A. (2008). Morphological prediction for Polish by a statistical a tergo index, SystemsScience 34(4): 7-17.
Search in Google Scholar
Sarukkai, R. and Ballard, D. (1996). Word set probability boosting for improved spontaneous dialogue recognition. The ab and tab algorithms, Technical Report TR-601, University of Rochester, New York, NY.
Search in Google Scholar
Sas, J. (2009). Optimal spoken dialog control in hands-free medical information systems, Journal of Medical Informaticsand Technologies 13: 113-120.
Search in Google Scholar
Sas, J. (2010). Application of local bidirectional language model to error correction in Polish medical speech recognition, Journal of Medical Informatics and Technologies15(1): 127-134.
Search in Google Scholar
Sas, J. and Żołnierek, A. (2011). Distant co-occurrence language model for ASR in loose word order languages, Proceedingsof the International Conference on Computer RecognitionSystems Cores 2011, Wrocław, Poland, pp. 767-778.
Search in Google Scholar
Vaiciunas, A., Kaminskas, V. and Raskinis, G. (2004).Statistical language models of Lithuanian based on word clustering and morphological decomposition, Informatica15(4): 565-580.10.15388/Informatica.2004.079
Search in Google Scholar
Ward, W. and Issar, S. (1996). A class based language model for speech recognition, Acoustics, Speech, and Signal Processing,ICASSP 96, Atlanta, GA, USA, pp. 416-418.
Search in Google Scholar
Whittaker, E. and Woodland, P. (2003). Language modelling for Russian and English using words and classes, ComputerSpeech and Language 17(1): 87-104.10.1016/S0885-2308(02)00047-5
Search in Google Scholar
Woliński, M. (2006). Morfeusz-a practical tool for the morphological analysis of Polish, Inteligent Processingand Web Mining: IIPWM06, Ustro´n, Poland, pp. 503-512.
Search in Google Scholar
Wózniak, M. and Krawczyk, B. (2012). Combined classifier based on feature space partitioning, International Journalof Applied Mathematics and Computer Science22(4): 855-866, DOI: 10.2478/v10006-012-0063-0.10.2478/v10006-012-0063-0
Search in Google Scholar
Young, S. and Everman, G. (2009). The HTK Book (for HTKVersion 3.4), Cambridge University, Cambridge.
Search in Google Scholar
Zółko, B., Skurzok, D. and Ziółko, M. (2010). Word n-grams for Polish, Proceedings of the 10th IASTED InternationalConference on Artificial Intelligence and Applications(AIA 2010), Innsbruck, Austria, pp. 197-201.
Search in Google Scholar
Ziółko, J., Gałka, J., Jadczyk, T., Skurzok, D. and Masior, M. (2011). Automatic speech recognition system dedicated for Polish, Proceedings of the INTERSPEECH 2011 Conference,Florence, Italy, pp. 3315-3316.
Search in Google Scholar
Ziółko, J., Gałka, J. and Skurzok, D. (2010). Speech modelling using phoneme segmentation and modified weighted Levenshtein distance, Proceedings of the ICALP2010 Colloquium,Bordeaux, France, pp. 743-746.
Search in Google Scholar