Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR Cover

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

International Journal on Smart Sensing and Intelligent Systems

Volume 18 (2025): Issue 1 (January 2025)

By:

Tripti Choudhary, Vishal Goyal and Atul Bansal

Open Access

|Mar 2025

Figures & Tables

The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.

LSTM block structure. LSTM, long short-term memory.

Overview of the data augmentation strategy and training pipeline for the proposed model.

Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.

Comparative results.

WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.

Dataset details for each language

	Hindi			Marathi			Odia

	Train	Test	Val.	Train	Test	Val.	Train	Test	Val.
Size in hours	95.05	5.55	5.49	93.89	5.0	0.67	94.54	5.49	4.66
Channel compression	3GP	3GP	3GP	3GP	3GP	M4A	M4A	M4A	M4A
Unique sentences	4506	386	316	2543	200	120	820	65	124
# Speakers	59	19	18	31	31	–	–	–	–
Words in vocabulary	6092	1681	1359	3245	547	350	1584	224	334

WER (%) for Indian languages

Model	Hindi		Marathi		Odia

	w/o LM	with LM	w/o LM	with LM	w/o LM	with LM
Proposed LSTM transformer (Baseline)	14.1	11.4	12.7	10.6	24.6	21.3
+ Synthetic data augmentation	13.5	10.9	12.3	10.2	24.2	21.0
+ Semi-supervised training	13.2	10.5	12.1	9.8	23.9	20.6

DOI: https://doi.org/10.2478/ijssis-2025-0009 | Journal eISSN: 1178-5608

Journal RSS Feed

Language: English

Submitted on: Nov 20, 2024

Published on: Mar 4, 2025

Published by: Professor Subhas Chandra Mukhopadhyay

In partnership with: Paradigm Publishing Services

Publication frequency: 1 times per year

Keywords:

Automatic Speech Recognition,

Data Augmentation,

Semi-supervised learning,

Low-resource ASR

Related subjects:

Introductions and overviews,

Engineering, other

© 2025 Tripti Choudhary, Vishal Goyal, Atul Bansal, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Previous article Volume 18 (2025): Issue 1 (January 2025)Next article