Have a personal or library account? Click to login
Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR Cover

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Open Access
|Mar 2025

Figures & Tables

Figure 1:

The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.
The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.

Figure 2:

LSTM block structure. LSTM, long short-term memory.
LSTM block structure. LSTM, long short-term memory.

Figure 3:

Overview of the data augmentation strategy and training pipeline for the proposed model.
Overview of the data augmentation strategy and training pipeline for the proposed model.

Figure 4:

Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.
Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.

Figure 5:

Comparative results.
Comparative results.

Figure 6:

WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.
WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.

Dataset details for each language

HindiMarathiOdia



TrainTestVal.TrainTestVal.TrainTestVal.
Size in hours95.055.555.4993.895.00.6794.545.494.66
Channel compression3GP3GP3GP3GP3GPM4AM4AM4AM4A
Unique sentences4506386316254320012082065124
# Speakers5919183131
Words in vocabulary60921681135932455473501584224334

WER (%) for Indian languages

ModelHindiMarathiOdia



w/o LMwith LMw/o LMwith LMw/o LMwith LM
Proposed LSTM transformer (Baseline)14.111.412.710.624.621.3
+ Synthetic data augmentation13.510.912.310.224.221.0
+ Semi-supervised training13.210.512.19.823.920.6
Language: English
Submitted on: Nov 20, 2024
Published on: Mar 4, 2025
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Tripti Choudhary, Vishal Goyal, Atul Bansal, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.