Have a personal or library account? Click to login
Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture Cover

Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

Open Access
|Jul 2025

References

  1. Kang. G Shin and Parameswaran Ramanathan. (2020). Real-Time Computing: A New Discipline of Computer Science and Engineering, Proceedings. IEEE, vol. 82, no. 1, pp. 6-24, 1994.
  2. Alec Radford, Jong Wook Kim, Tao Xu , Greg Brockman, Christine McLeavey, Ilya Sutskever. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356
  3. Likhomanenko, T., Xu, Q., Pratap, V., Tomasello, P., Kahn, J., Avidov, G., Collobert, R., and Synnaeve, G. (2020). Rethinking evaluation in asr: Are our models robust enough? arXiv preprint arXiv:2010.11745
  4. Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv:2006.11477.
  5. Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., and Norouzi, M. SpeechStew. (2021). Simply mix all available speech recognition data to train one large neural network. arXiv preprint arXiv:2104.02133.
  6. Zhang, Y., Park, D. S., Han, W., Qin, J., Gulati, A., Shor, J., Jansen, A., Xu, Y., Huang, Y., Wang, S., et al. (2021). BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv:2109.13226.
  7. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008.
  8. Valk, J. and Aluma ̈e, T. (2021) Voxlingua107: a dataset for spoken language recognition. In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 652–658. IEEE.
  9. Sanchit Gandhi, Patrick von Platen & Alexander M. Rush. (2017). Distil-Whisper: Robust knowledge distillation via large-scale pseudo labelling, arXiv:2311.00430
  10. Nicolas Patry. (2022) Making automatic speech recognition work on large files with Wav2Vec2 in Transformers. https://huggingface.co/blog/asr-chunking. Accessed: 25 Nov.,
  11. H. Nanjo and T. Kawahara (2005) A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding 2024.
  12. Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., and Song, D. (2020). Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100.
  13. Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy Web, Romanian datasets, http://www.racai.ro
Language: English
Page range: 1282 - 1293
Published on: Jul 24, 2025
Published by: The Bucharest University of Economic Studies
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Robert Cristian Necula, Pavel-Cristian Craciun, published by The Bucharest University of Economic Studies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.