Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

Necula, Robert Cristian; Craciun, Pavel-Cristian

doi:10.2478/picbe-2025-0101

References

Kang. G Shin and Parameswaran Ramanathan. (2020). Real-Time Computing: A New Discipline of Computer Science and Engineering, Proceedings. IEEE, vol. 82, no. 1, pp. 6-24, 1994.
Search in Google Scholar Back to article
Alec Radford, Jong Wook Kim, Tao Xu , Greg Brockman, Christine McLeavey, Ilya Sutskever. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356
Search in Google Scholar Back to article
Likhomanenko, T., Xu, Q., Pratap, V., Tomasello, P., Kahn, J., Avidov, G., Collobert, R., and Synnaeve, G. (2020). Rethinking evaluation in asr: Are our models robust enough? arXiv preprint arXiv:2010.11745
Search in Google Scholar Back to article
Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv:2006.11477.
Search in Google Scholar Back to article
Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., and Norouzi, M. SpeechStew. (2021). Simply mix all available speech recognition data to train one large neural network. arXiv preprint arXiv:2104.02133.
Search in Google Scholar Back to article
Zhang, Y., Park, D. S., Han, W., Qin, J., Gulati, A., Shor, J., Jansen, A., Xu, Y., Huang, Y., Wang, S., et al. (2021). BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv:2109.13226.
Search in Google Scholar Back to article
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008.
Search in Google Scholar Back to article
Valk, J. and Aluma ̈e, T. (2021) Voxlingua107: a dataset for spoken language recognition. In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 652–658. IEEE.
Search in Google Scholar Back to article
Sanchit Gandhi, Patrick von Platen & Alexander M. Rush. (2017). Distil-Whisper: Robust knowledge distillation via large-scale pseudo labelling, arXiv:2311.00430
Search in Google Scholar Back to article
Nicolas Patry. (2022) Making automatic speech recognition work on large files with Wav2Vec2 in Transformers. https://huggingface.co/blog/asr-chunking. Accessed: 25 Nov.,
Search in Google Scholar Back to article
H. Nanjo and T. Kawahara (2005) A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding 2024.
Search in Google Scholar Back to article
Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., and Song, D. (2020). Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100.
Search in Google Scholar Back to article
Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy Web, Romanian datasets, http://www.racai.ro
Search in Google Scholar Back to article

Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

References

Paradigm

My account