Automatic speech recognition (ASR) is essential for developing intelligent systems capable of accurately processing human speech, particularly in low-resource languages. This study addresses the challenges faced by ASR systems in Indian languages, where data and resources are limited. The authors propose a novel three-step methodology that combines data augmentation and semi-supervised learning to enhance ASR performance. First, an enhanced long short-term memory (LSTM) network is used to train a baseline model with limited labeled data. Next, synthetic data is generated and combined with original recordings to refine the ASR model. Finally, semi-supervised training further boosts accuracy. Evaluations demonstrate significant improvements over existing models for Hindi, Marathi, and Odia languages.
© 2025 Tripti Choudhary, Vishal Goyal, Atul Bansal, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.