Have a personal or library account? Click to login
Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR Cover

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Open Access
|Mar 2025

Figures & Tables

Figure 1:

The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.

Figure 2:

LSTM block structure. LSTM, long short-term memory.

Figure 3:

Overview of the data augmentation strategy and training pipeline for the proposed model.

Figure 4:

Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.

Figure 5:

Comparative results.

Figure 6:

WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.

Dataset details for each language

HindiMarathiOdia



TrainTestVal.TrainTestVal.TrainTestVal.
Size in hours95.055.555.4993.895.00.6794.545.494.66
Channel compression3GP3GP3GP3GP3GPM4AM4AM4AM4A
Unique sentences4506386316254320012082065124
# Speakers5919183131
Words in vocabulary60921681135932455473501584224334

WER (%) for Indian languages

ModelHindiMarathiOdia



w/o LMwith LMw/o LMwith LMw/o LMwith LM
Proposed LSTM transformer (Baseline)14.111.412.710.624.621.3
+ Synthetic data augmentation13.510.912.310.224.221.0
+ Semi-supervised training13.210.512.19.823.920.6
Language: English
Submitted on: Nov 20, 2024
|
Published on: Mar 4, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Tripti Choudhary, Vishal Goyal, Atul Bansal, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.