Have a personal or library account? Click to login

TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Open Access
|Jan 2018

References

  1. [1] Koctúr, T., Juhár, J., Viszlay, P., Staš, J., and Lojka, M. (2016). Unsupervised speech transcription and alignment based on two complementary ASR systems. In Proceedings of RADIOELEKTRONIKA 2016, pages 358–362, Košice, Slovakia.10.1109/RADIOELEK.2016.7477435
  2. [2] Rosseau, A., Deléglise, P., and Estève, Y. (2012). TED-LIUM: An automatic speech recognition dedicated corpus. In Proceedings of LREC 2012, pages 125–129, Istanbul, Turkey.
  3. [3] Deléglise, P., Estève, Y., Meignier, S., and Merlin, T. (2009). Improvements to the LIUM French ASR system based on CMU Sphinx: What helps to significantly reduce the word error rate? In Proceedings of INTERSPEECH 2009, pages 2123–2126, Brighton, UK.10.21437/Interspeech.2009-607
  4. [4] Žgank, A., Maučec, M. S., Verdonik, D. (2016). The SI TEDx-UM speech database: A new Slovenian spoken language resource. In Proceedings of LREC 2016, pages 4670–4673, Portorož, Slovenia.
  5. [5] Rosseau, A., Deléglise, P., and Estève, Y. (2014). Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In Proceedings of LREC 2014, pages 3935–3939, Reykjavik, Iceland.
  6. [6] Leeuwis, E., Federico, M., and Cettolo, M. (2003). Language modeling and transcription of the TED corpus lectures. In Proceedings of ICASSP 2003, pages 232–235, Hong Kong, China.10.1109/ICASSP.2003.1198760
  7. [7] Cettolo, M., Brugnara, F. and Federico, M. (2004). Advances in the automatic transcription of lectures. In Proceedings of ICASSP 2004, pages 769–772, Montreal, Canada.10.1109/ICASSP.2004.1326099
  8. [8] Niesler, T. and Willet, D. (2002). Unsupervised language model adaptation for lecture speech transcription. In Proceedings of ICSLP 2002, pages 1413–1416, Denver, Colorado, USA.10.21437/ICSLP.2002-63
  9. [9] Wölfel, M. and Berger, S. (2005). The ISL baseline lecture transcription system for the TED corpus. Tech. Rep., Karlsruhe University, Germany.
  10. [10] Naptali, W. and Kawahara, T. (2012). Automatic transcription of TED talks. In Proceedings of the 6th Spoken Document Processing Workshop, SDPWS 2012, Toyohashi, Japan.
  11. [11] Bell, P., Yamamoto, H., Swietojanski, P., Wu, Y., McInnes, F., Hori, Ch., and Renals, S. (2013). A lecture transcription system combining neural network acoustic and language models. In Proceedings of INTERSPEECH 2013, pages 3081–3091, Lyon, France.10.21437/Interspeech.2013-673
  12. [12] Nanjo, H., Shitaoka, K., and Kawahara, T. (2003). Automatic transformation of lecture transcription into document style using statistical framework. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, SSPR 2003, Tokyo, Japan.
  13. [13] Hsu, B.-J. and Glass, J. (2009). Language model parameter estimation using user transcriptions. In Proceedings of ICASSP 2009, pages 4805–4808, Taipei, Taiwan.10.1109/ICASSP.2009.4960706
  14. [14] Akita, Y., Watanabe, M., and Kawahara, T. (2012). Automatic transcription of lecture speech using language model based on speaking-style transformation of proceedings texts. In Proceedings of INTERSPEECH 2012, pages 2326–2329, Portland, Oregon, USA.10.21437/Interspeech.2012-610
  15. [15] Viszlay, P., Staš, J., Koctúr, T., Lojka, M., and Juhár, J. (2016). An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In Proceedings of LREC 2016, pages 4684–4687, Portorož, Slovenia.
  16. [16] Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., and Čižmár, A. (2014). Query-by-example retrieval via fast sequential dynamic time warping algorithm. In Proceedings of the 37th International Conference on Telecommunications and Signal Processing, TSP 2014, pages 453–457, Berlin, Germany.
  17. [17] Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., and Juhár, J. (2015). Automatic subtitling system for transcription, archiving and indexing of Slovak audiovisual recordings. In Proceedings of the 7th Language & Technology Conference, LTC 2015, pages 186–191, Poznań, Poland.
  18. [18] Lee, A., Kawahara, T., and Shikano, K. (2001). Julius – An open source real-time large vocabulary recognition engine. In Proceedings of EUROSPEECH 2001, pages 1691–1694, Aalborg, Denmark.10.21437/Eurospeech.2001-396
  19. [19] Lojka, M., Ondáš, S., Pleva, M., and Juhár, J. (2014). Multi-threaded parallel speech recognition for mobile applications. Journal of Electrical and Electronics Engineering, 7(1):81–86.
  20. [20] Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomský, M., and Ondáš, S. (2016). Advances in the Slovak judicial domain dictation system. In Vertulani, Z., Uszkoreit, H., and Kubis, M., editors, Human Language Technology: Challenges for Computer Science and Linguistics, LNAI 9561, pages 55–67, Springer International Publishing Switzerland.10.1007/978-3-319-43808-5_5
  21. [21] Koctúr, T., Staš, J., and Juhár, J. (2016). Unsupervised acoustic corpora building based on variable confidence measure thresholding. In Proceedings of the 58th International Symposium ELMAR 2016, pages 31–34, Zadar, Croatia.10.1109/ELMAR.2016.7731748
  22. [22] Darjaa, S., Cerňak, M., Trnka, M., and Rusko, M. (2011). Effective triphone mapping for acoustic modeling in speech recognition. In Proceedings of INTERSPEECH 2011, pages 1717–1720, Florence, Italy.10.21437/Interspeech.2011-190
  23. [23] Stolcke, A. (2002). SRILM – An extensible language modeling toolkit. In Proceedings of ICSLP 2002, pages 901–904, Denver, Colorado, USA.10.21437/ICSLP.2002-303
  24. [24] Staš, J. and Juhár, J. (2015). Modeling of the Slovak language for broadcast news transcription. Journal of Electrical and Electronics Engineering, 8(2):43–46.
  25. [25] Hládek, D., Ondáš, S., and Staš, J. (2014). Online natural language processing of the Slovak language. In Proceedings of the 5th IEEE International Conference on Cognitive InfoCommunications, CogInfoCom 2014, pages 315–316, Vietri sul Mare, Italy.10.1109/CogInfoCom.2014.7020469
  26. [26] Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In Proceedings of ASRU 1997, pages 347–352, Santa Barbara, CA, USA.10.1109/ASRU.1997.659110
  27. [27] Lojka, M. and Juhár, J. (2014). Hypothesis combination for Slovak dictation speech recognition. In Proceedings of the 56th International Symposium ELMAR 2014, pages 43–46, Zadar, Croatia.10.1109/ELMAR.2014.6923311
  28. [28] Staš, J., Hládek, D, and Juhár, J. (2016). Adding filled pauses and disfluent events into language models for speech recognition. In Proceedings of the 7th IEEE International Conference on Cognitive InfoCommunications, CogInfoCom 2016, Wroclaw, Poland.10.1109/CogInfoCom.2016.7804538
DOI: https://doi.org/10.1515/jazcas-2017-0044 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 346 - 354
Published on: Jan 24, 2018
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2018 Ján Staš, Daniel Hládek, Peter Viszlay, Tomáš Koctúr, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.