Have a personal or library account? Click to login
Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR Cover

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Open Access
|Mar 2025

References

  1. Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  2. Anton Ragni, Kate M Knill, Shakti P Rath, and Mark JF Gales. Data augmentation for low resource languages. In INTERSPEECH 2014: 15th annual conference of the international speech communication association, pages 810–814. International Speech Communication Association (ISCA), 2014.
  3. Shiyu Zhou, Shuang Xu, and Bo Xu. Multilingual end-to-end speech recognition with a single transformer on low-resource languages. arXiv preprint arXiv:1806.05059, 2018.
  4. Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, and Bo Xu. Applying wav2vec2. 0 to speech recognition in various low-resource languages. arXiv preprint arXiv:2012.12121, 2020.
  5. Satwinder Singh, Ruili Wang, and Feng Hou. Improved meta learning for low resource speech recognition. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4798–4802. IEEE, 2022.
  6. Ankit Kumar and Rajesh Kumar Aggarwal. A hybrid cnn-ligru acoustic modeling using raw waveform sincnet for hindi asr. Computer Science, 21(4), 2020.
  7. A Kumar, T Choudhary, M Dua, and M Sabharwal. Hybrid end-to-end architecture for hindi speech recognition system. In Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences: PCCDS 2021, pages 267–276. Springer, 2022.
  8. Ankit Kumar and Rajesh K Aggarwal. An investigation of multilingual tdnn-blstm acoustic modeling for hindi speech recognition. International Journal of Sensors Wireless Communications and Control, 12(1):19–31, 2022.
  9. Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh, and Khaled Shaalan. Speech recognition using deep neural networks: A systematic review. IEEE access, 7:19143–19165, 2019.
  10. Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, and Martijn Wieling. Making more of little data: Improving low-resource automatic speech recognition using data augmentation. arXiv preprint arXiv:2305.10951, 2023.
  11. Ankit Kumar and Rajesh Kumar Aggarwal. An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition. Journal of Reliable Intelligent Environments, 8(2):117–132, 2022.
  12. Jacob Kahn, Ann Lee, and Awni Hannun. Self-training for end-to-end speech recognition. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7084–7088. IEEE, 2020.
  13. Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, and Takaaki Hori. Momentum pseudo-labeling for semi-supervised speech recognition. arXiv preprint arXiv:2106.08922, 2021.
  14. Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:2005.09629, 2020.
  15. Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, and Yonghong Yan. Alternative pseudo-labeling for semi-supervised automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
  16. Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  17. Julia Mainzinger. Fine-tuning asr models for very low-resource languages: A study on mvskoke. Master's thesis, University of Washington, 2024.
  18. Robert Jimerson, Zoey Liu, and Emily Prud'Hommeaux. An (unhelpful) guide to selecting the best asr architecture for your under-resourced language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1008–1016, 2023.
  19. Shiyue Zhang, Ben Frey, and Mohit Bansal. How can nlp help revitalize endangered languages? a case study and roadmap for the cherokee language. arXiv preprint arXiv:2204.11909, 2022.
  20. Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, et al. Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25(97):1–52, 2024.
  21. Marieke Meelen, Alexander O'neill, and Rolando Coto-Solano. End-to-end speech recognition for endangered languages of nepal. In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 83–93, 2024.
  22. Panji Arisaputra, Alif Tri Handoyo, and Amalia Zahra. Xls-r deep learning model for multilingual asr on low-resource languages: Indonesian, javanese, and sundanese. arXiv preprint arXiv:2401.06832, 2024.
  23. Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang, and Lixin Pan. Improving low-resource tibetan end-to-end asr by multilingual and multilevel unit modeling. EURASIP Journal on Audio, Speech, and Music Processing, 2022(1):2, 2022.
  24. Kaushal Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M Khapra. Effectiveness of mining audio and text pairs from public data for improving asr systems for low-resource languages. In Icassp 2023–2023 ieee international conference on acoustics, speech and signal processing (icassp), pages 1–5. IEEE, 2023.
  25. Zoey Liu, Justin Spence, and Emily Prud'Hommeaux. Studying the impact of language model size for low-resource asr. In Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 77–83, 2023.
  26. Gueorgui Pironkov, Sean UN Wood, and Stéphane Dupont. Hybrid-task learning for robust automatic speech recognition. Computer Speech & Language, 64:101103, 2020.
  27. Mohamed Tamazin, Ahmed Gouda, and Mohamed Khedr. Enhanced automatic speech recognition system based on enhancing power-normalized cepstral coefficients. Applied Sciences, 9(10):2166, 2019.
  28. Syed Shahnawazuddin, KT Deepak, Gayadhar Pradhan, and Rohit Sinha. Enhancing noise and pitch robustness of children's asr. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5225–5229. IEEE, 2017.
  29. Jiri Malek, Jindrich Zdansky, and Petr Cerva. Robust automatic recognition of speech with background music. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5210–5214. IEEE, 2017.
  30. Sheng-Chieh Lee, Jhing-Fa Wang, and Miao-Hia Chen. Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors, 18(7):2068, 2018.
  31. Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:2005.09629, 2020.
  32. Satyender Jaglan, Sanjeev Kumar Dhull, and Krishna Kant Singh. Tertiary wavelet model based automatic epilepsy classification system. International Journal of Intelligent Unmanned Systems, 11(1):166–181, 2023.
  33. Yuzong Liu and Katrin Kirchhoff. Graph-based semisupervised learning for acoustic modeling in automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11):1946–1956, 2016.
  34. Michael I Mandel and Jon Barker. Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In INTERSPEECH, pages 1991–1995, 2016.
  35. Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, and Hiroshi G Okuno. Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):373–382, 2015.
  36. Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, and Li Liu. Deep learning for scene classification: A survey. arXiv preprint arXiv:2101.10531, 2021.
  37. Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, and Vivek Raghavan. Vakyansh: Asr toolkit for low resource indic languages. arXiv preprint arXiv:2203.16512, 2022.
  38. Jaehyeon Kim, Sungwon Kim, Jungil Kong, and Sungroh Yoon. Glow-tts: A generative flow for text-to-speech via monotonic alignment search. Advances in Neural Information Processing Systems, 33:8067–8077, 2020.
  39. Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in neural information processing systems, 33:17022–17033, 2020.
  40. Lori Lamel, Jean-Luc Gauvain, and Gilles Adda. Lightly supervised and unsupervised acoustic model training. Computer Speech & Language, 16(1):115–129, 2002.
  41. Ho Yin Chan and Phil Woodland. Improving broadcast news transcription by lightly supervised discriminative training. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages I–737. IEEE, 2004.
  42. Vimal Manohar, Hossein Hadian, Daniel Povey, and Sanjeev Khudanpur. Semi-supervised training of acoustic models using lattice-free mmi. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4844–4848. IEEE, 2018.
  43. Thiago Fraga-Silva, Jean-Luc Gauvain, and Lori Lamel. Lattice-based unsupervised acoustic model training. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4656–4659. IEEE, 2011.
  44. Vaswani, A. Attention is all you need, Advances in Neural Information Processing Systems, 2017.
Language: English
Submitted on: Nov 20, 2024
|
Published on: Mar 4, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Tripti Choudhary, Vishal Goyal, Atul Bansal, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.