TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Ján Staš; Daniel Hládek; Peter Viszlay; Tomáš Koctúr

doi:10.1515/jazcas-2017-0044

.blurhash-client-img { display: none !important; }

TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Journal of Linguistics/Jazykovedný casopis

Volume 68 (2017): Issue 2 (December 2017)

By: Ján Staš, Daniel Hládek, Peter Viszlay and Tomáš Koctúr

Open Access

|Jan 2018

[1] Koctúr, T., Juhár, J., Viszlay, P., Staš, J., and Lojka, M. (2016). Unsupervised speech transcription and alignment based on two complementary ASR systems. In Proceedings of RADIOELEKTRONIKA 2016, pages 358–362, Košice, Slovakia.10.1109/RADIOELEK.2016.7477435
Search in Google Scholar Back to article
[2] Rosseau, A., Deléglise, P., and Estève, Y. (2012). TED-LIUM: An automatic speech recognition dedicated corpus. In Proceedings of LREC 2012, pages 125–129, Istanbul, Turkey.
Search in Google Scholar Back to article
[3] Deléglise, P., Estève, Y., Meignier, S., and Merlin, T. (2009). Improvements to the LIUM French ASR system based on CMU Sphinx: What helps to significantly reduce the word error rate? In Proceedings of INTERSPEECH 2009, pages 2123–2126, Brighton, UK.10.21437/Interspeech.2009-607
Search in Google Scholar Back to article
[4] Žgank, A., Maučec, M. S., Verdonik, D. (2016). The SI TEDx-UM speech database: A new Slovenian spoken language resource. In Proceedings of LREC 2016, pages 4670–4673, Portorož, Slovenia.
Search in Google Scholar Back to article
[5] Rosseau, A., Deléglise, P., and Estève, Y. (2014). Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In Proceedings of LREC 2014, pages 3935–3939, Reykjavik, Iceland.
Search in Google Scholar Back to article
[6] Leeuwis, E., Federico, M., and Cettolo, M. (2003). Language modeling and transcription of the TED corpus lectures. In Proceedings of ICASSP 2003, pages 232–235, Hong Kong, China.10.1109/ICASSP.2003.1198760
Search in Google Scholar Back to article
[7] Cettolo, M., Brugnara, F. and Federico, M. (2004). Advances in the automatic transcription of lectures. In Proceedings of ICASSP 2004, pages 769–772, Montreal, Canada.10.1109/ICASSP.2004.1326099
Search in Google Scholar Back to article
[8] Niesler, T. and Willet, D. (2002). Unsupervised language model adaptation for lecture speech transcription. In Proceedings of ICSLP 2002, pages 1413–1416, Denver, Colorado, USA.10.21437/ICSLP.2002-63
Search in Google Scholar Back to article
[9] Wölfel, M. and Berger, S. (2005). The ISL baseline lecture transcription system for the TED corpus. Tech. Rep., Karlsruhe University, Germany.
Search in Google Scholar Back to article
[10] Naptali, W. and Kawahara, T. (2012). Automatic transcription of TED talks. In Proceedings of the 6^th Spoken Document Processing Workshop, SDPWS 2012, Toyohashi, Japan.
Search in Google Scholar Back to article
[11] Bell, P., Yamamoto, H., Swietojanski, P., Wu, Y., McInnes, F., Hori, Ch., and Renals, S. (2013). A lecture transcription system combining neural network acoustic and language models. In Proceedings of INTERSPEECH 2013, pages 3081–3091, Lyon, France.10.21437/Interspeech.2013-673
Search in Google Scholar Back to article
[12] Nanjo, H., Shitaoka, K., and Kawahara, T. (2003). Automatic transformation of lecture transcription into document style using statistical framework. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, SSPR 2003, Tokyo, Japan.
Search in Google Scholar Back to article
[13] Hsu, B.-J. and Glass, J. (2009). Language model parameter estimation using user transcriptions. In Proceedings of ICASSP 2009, pages 4805–4808, Taipei, Taiwan.10.1109/ICASSP.2009.4960706
Search in Google Scholar Back to article
[14] Akita, Y., Watanabe, M., and Kawahara, T. (2012). Automatic transcription of lecture speech using language model based on speaking-style transformation of proceedings texts. In Proceedings of INTERSPEECH 2012, pages 2326–2329, Portland, Oregon, USA.10.21437/Interspeech.2012-610
Search in Google Scholar Back to article
[15] Viszlay, P., Staš, J., Koctúr, T., Lojka, M., and Juhár, J. (2016). An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In Proceedings of LREC 2016, pages 4684–4687, Portorož, Slovenia.
Search in Google Scholar Back to article
[16] Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., and Čižmár, A. (2014). Query-by-example retrieval via fast sequential dynamic time warping algorithm. In Proceedings of the 37^th International Conference on Telecommunications and Signal Processing, TSP 2014, pages 453–457, Berlin, Germany.
Search in Google Scholar Back to article
[17] Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., and Juhár, J. (2015). Automatic subtitling system for transcription, archiving and indexing of Slovak audiovisual recordings. In Proceedings of the 7^th Language & Technology Conference, LTC 2015, pages 186–191, Poznań, Poland.
Search in Google Scholar Back to article
[18] Lee, A., Kawahara, T., and Shikano, K. (2001). Julius – An open source real-time large vocabulary recognition engine. In Proceedings of EUROSPEECH 2001, pages 1691–1694, Aalborg, Denmark.10.21437/Eurospeech.2001-396
Search in Google Scholar Back to article
[19] Lojka, M., Ondáš, S., Pleva, M., and Juhár, J. (2014). Multi-threaded parallel speech recognition for mobile applications. Journal of Electrical and Electronics Engineering, 7(1):81–86.
Search in Google Scholar Back to article
[20] Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomský, M., and Ondáš, S. (2016). Advances in the Slovak judicial domain dictation system. In Vertulani, Z., Uszkoreit, H., and Kubis, M., editors, Human Language Technology: Challenges for Computer Science and Linguistics, LNAI 9561, pages 55–67, Springer International Publishing Switzerland.10.1007/978-3-319-43808-5_5
Search in Google Scholar Back to article
[21] Koctúr, T., Staš, J., and Juhár, J. (2016). Unsupervised acoustic corpora building based on variable confidence measure thresholding. In Proceedings of the 58^th International Symposium ELMAR 2016, pages 31–34, Zadar, Croatia.10.1109/ELMAR.2016.7731748
Search in Google Scholar Back to article
[22] Darjaa, S., Cerňak, M., Trnka, M., and Rusko, M. (2011). Effective triphone mapping for acoustic modeling in speech recognition. In Proceedings of INTERSPEECH 2011, pages 1717–1720, Florence, Italy.10.21437/Interspeech.2011-190
Search in Google Scholar Back to article
[23] Stolcke, A. (2002). SRILM – An extensible language modeling toolkit. In Proceedings of ICSLP 2002, pages 901–904, Denver, Colorado, USA.10.21437/ICSLP.2002-303
Search in Google Scholar Back to article
[24] Staš, J. and Juhár, J. (2015). Modeling of the Slovak language for broadcast news transcription. Journal of Electrical and Electronics Engineering, 8(2):43–46.
Search in Google Scholar Back to article
[25] Hládek, D., Ondáš, S., and Staš, J. (2014). Online natural language processing of the Slovak language. In Proceedings of the 5^th IEEE International Conference on Cognitive InfoCommunications, CogInfoCom 2014, pages 315–316, Vietri sul Mare, Italy.10.1109/CogInfoCom.2014.7020469
Search in Google Scholar Back to article
[26] Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In Proceedings of ASRU 1997, pages 347–352, Santa Barbara, CA, USA.10.1109/ASRU.1997.659110
Search in Google Scholar Back to article
[27] Lojka, M. and Juhár, J. (2014). Hypothesis combination for Slovak dictation speech recognition. In Proceedings of the 56^th International Symposium ELMAR 2014, pages 43–46, Zadar, Croatia.10.1109/ELMAR.2014.6923311
Search in Google Scholar Back to article
[28] Staš, J., Hládek, D, and Juhár, J. (2016). Adding filled pauses and disfluent events into language models for speech recognition. In Proceedings of the 7^th IEEE International Conference on Cognitive InfoCommunications, CogInfoCom 2016, Wroclaw, Poland.10.1109/CogInfoCom.2016.7804538
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.1515/jazcas-2017-0044 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597

Journal RSS Feed

Language: English

Page range: 346 - 354

Published on: Jan 24, 2018

Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics

In partnership with: Paradigm Publishing Services

Publication frequency: 3 issues per year

Keywords:

automatic annotation,

speech recognition,

speech corpus

Related subjects:

Linguistics and semiotics,

Theoretical frameworks and disciplines,

Linguistics, other

© 2018 Ján Staš, Daniel Hládek, Peter Viszlay, Tomáš Koctúr, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 68 (2017): Issue 2 (December 2017)