Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. <em>Advances in neural information processing systems</em>, 33:12449–12460, 2020.
Anton Ragni, Kate M Knill, Shakti P Rath, and Mark JF Gales. Data augmentation for low resource languages. In <em>INTERSPEECH 2014: 15th annual conference of the international speech communication association</em>, pages 810–814. International Speech Communication Association (ISCA), 2014.
Shiyu Zhou, Shuang Xu, and Bo Xu. Multilingual end-to-end speech recognition with a single transformer on low-resource languages. <em>arXiv preprint arXiv:1806.05059</em>, 2018.
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, and Bo Xu. Applying wav2vec2. 0 to speech recognition in various low-resource languages. <em>arXiv preprint arXiv:2012.12121</em>, 2020.
Satwinder Singh, Ruili Wang, and Feng Hou. Improved meta learning for low resource speech recognition. In <em>ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pages 4798–4802. IEEE, 2022.
Ankit Kumar and Rajesh Kumar Aggarwal. A hybrid cnn-ligru acoustic modeling using raw waveform sincnet for hindi asr. <em>Computer Science</em>, 21(4), 2020.
A Kumar, T Choudhary, M Dua, and M Sabharwal. Hybrid end-to-end architecture for hindi speech recognition system. In <em>Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences: PCCDS 2021</em>, pages 267–276. Springer, 2022.
Ankit Kumar and Rajesh K Aggarwal. An investigation of multilingual tdnn-blstm acoustic modeling for hindi speech recognition. <em>International Journal of Sensors Wireless Communications and Control</em>, 12(1):19–31, 2022.
Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh, and Khaled Shaalan. Speech recognition using deep neural networks: A systematic review. <em>IEEE access</em>, 7:19143–19165, 2019.
Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, and Martijn Wieling. Making more of little data: Improving low-resource automatic speech recognition using data augmentation. <em>arXiv preprint arXiv:2305.10951</em>, 2023.
Ankit Kumar and Rajesh Kumar Aggarwal. An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition. <em>Journal of Reliable Intelligent Environments</em>, 8(2):117–132, 2022.
Jacob Kahn, Ann Lee, and Awni Hannun. Self-training for end-to-end speech recognition. In <em>ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pages 7084–7088. IEEE, 2020.
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, and Takaaki Hori. Momentum pseudo-labeling for semi-supervised speech recognition. <em>arXiv preprint arXiv:2106.08922</em>, 2021.
Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. <em>arXiv preprint arXiv:2005.09629</em>, 2020.
Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, and Yonghong Yan. Alternative pseudo-labeling for semi-supervised automatic speech recognition. <em>IEEE/ACM Transactions on Audio, Speech, and Language Processing</em>, 2023.
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. <em>Advances in neural information processing systems</em>, 33:12449–12460, 2020.
Robert Jimerson, Zoey Liu, and Emily Prud'Hommeaux. An (unhelpful) guide to selecting the best asr architecture for your under-resourced language. In <em>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</em>, pages 1008–1016, 2023.
Shiyue Zhang, Ben Frey, and Mohit Bansal. How can nlp help revitalize endangered languages? a case study and roadmap for the cherokee language. <em>arXiv preprint arXiv:2204.11909</em>, 2022.
Marieke Meelen, Alexander O'neill, and Rolando Coto-Solano. End-to-end speech recognition for endangered languages of nepal. In <em>Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages</em>, pages 83–93, 2024.
Panji Arisaputra, Alif Tri Handoyo, and Amalia Zahra. Xls-r deep learning model for multilingual asr on low-resource languages: Indonesian, javanese, and sundanese. <em>arXiv preprint arXiv:2401.06832</em>, 2024.
Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang, and Lixin Pan. Improving low-resource tibetan end-to-end asr by multilingual and multilevel unit modeling. <em>EURASIP Journal on Audio, Speech, and Music Processing</em>, 2022(1):2, 2022.
Kaushal Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M Khapra. Effectiveness of mining audio and text pairs from public data for improving asr systems for low-resource languages. In <em>Icassp 2023–2023 ieee international conference on acoustics, speech and signal processing (icassp)</em>, pages 1–5. IEEE, 2023.
Zoey Liu, Justin Spence, and Emily Prud'Hommeaux. Studying the impact of language model size for low-resource asr. In <em>Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages</em>, pages 77–83, 2023.
Gueorgui Pironkov, Sean UN Wood, and Stéphane Dupont. Hybrid-task learning for robust automatic speech recognition. <em>Computer Speech & Language</em>, 64:101103, 2020.
Mohamed Tamazin, Ahmed Gouda, and Mohamed Khedr. Enhanced automatic speech recognition system based on enhancing power-normalized cepstral coefficients. <em>Applied Sciences</em>, 9(10):2166, 2019.
Syed Shahnawazuddin, KT Deepak, Gayadhar Pradhan, and Rohit Sinha. Enhancing noise and pitch robustness of children's asr. In <em>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pages 5225–5229. IEEE, 2017.
Jiri Malek, Jindrich Zdansky, and Petr Cerva. Robust automatic recognition of speech with background music. In <em>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pages 5210–5214. IEEE, 2017.
Sheng-Chieh Lee, Jhing-Fa Wang, and Miao-Hia Chen. Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. <em>Sensors</em>, 18(7):2068, 2018.
Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. <em>arXiv preprint arXiv:2005.09629</em>, 2020.
Yuzong Liu and Katrin Kirchhoff. Graph-based semisupervised learning for acoustic modeling in automatic speech recognition. <em>IEEE/ACM Transactions on Audio, Speech, and Language Processing</em>, 24(11):1946–1956, 2016.
Michael I Mandel and Jon Barker. Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In <em>INTERSPEECH</em>, pages 1991–1995, 2016.
Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, and Hiroshi G Okuno. Automatic speech recognition for mixed dialect utterances by mixing dialect language models. <em>IEEE/ACM Transactions on Audio, Speech, and Language Processing</em>, 23(2):373–382, 2015.
Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, and Li Liu. Deep learning for scene classification: A survey. <em>arXiv preprint arXiv:2101.10531</em>, 2021.
Jaehyeon Kim, Sungwon Kim, Jungil Kong, and Sungroh Yoon. Glow-tts: A generative flow for text-to-speech via monotonic alignment search. <em>Advances in Neural Information Processing Systems</em>, 33:8067–8077, 2020.
Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. <em>Advances in neural information processing systems</em>, 33:17022–17033, 2020.
Lori Lamel, Jean-Luc Gauvain, and Gilles Adda. Lightly supervised and unsupervised acoustic model training. <em>Computer Speech & Language</em>, 16(1):115–129, 2002.
Ho Yin Chan and Phil Woodland. Improving broadcast news transcription by lightly supervised discriminative training. In <em>2004 IEEE International Conference on Acoustics, Speech, and Signal Processing</em>, volume 1, pages I–737. IEEE, 2004.
Vimal Manohar, Hossein Hadian, Daniel Povey, and Sanjeev Khudanpur. Semi-supervised training of acoustic models using lattice-free mmi. In <em>2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)</em>, pages 4844–4848. IEEE, 2018.
Thiago Fraga-Silva, Jean-Luc Gauvain, and Lori Lamel. Lattice-based unsupervised acoustic model training. In <em>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, pages 4656–4659. IEEE, 2011.