Have a personal or library account? Click to login
A multi-threaded approach for improved and faster accent transcription of chemical terms Cover

A multi-threaded approach for improved and faster accent transcription of chemical terms

Open Access
|Apr 2025

Figures & Tables

Figure 1:

Overview of the proposed work.
Overview of the proposed work.

Figure 2:

Initial model.
Initial model.

Figure 3:

Flow diagram of improved model.
Flow diagram of improved model.

Figure 4:

Improved model.
Improved model.

Figure 5:

Comparison performance (in seconds).
Comparison performance (in seconds).

Figure 6:

First meaningful transcription time.
First meaningful transcription time.

Figure 7:

Stress testing (hours).
Stress testing (hours).

Figure 8:

WER scores without noise. WER, word error rate.
WER scores without noise. WER, word error rate.

Figure 9:

WER scores with noise. WER, word error rate.
WER scores with noise. WER, word error rate.

Figure 10:

Time taken for transcription.
Time taken for transcription.

Figure 11:

WER comparison with Google-STT. WER, word error rate; STT, Speech-to-Text.
WER comparison with Google-STT. WER, word error rate; STT, Speech-to-Text.

Figure 12:

Time taken for transcription comparison with Google STT. STT, Speech-to-Text.
Time taken for transcription comparison with Google STT. STT, Speech-to-Text.

Figure 13:

Confusion matrix for classification of chemical elements from text.
Confusion matrix for classification of chemical elements from text.

Figure 14:

Web application.
Web application.

Figure 15:

Email details.
Email details.

Comparative results (in seconds)

Audio fileAudio durationInitial modelImproved model
audio 00138.1544.8040.83
audio 00270.9779.5379.83
audio 00380.6987.7882.72
audio 00454.8662.1959.21
audio 00533.2538.0939.40
audio 00640.9358.6653.68
audio 00748.1353.8551.81
audio 00833.4938.6835.13
audio 00933.9438.5533.82
audio 01048.9554.2850.15

Performance of existing AER systems over Indian accents

FeatureWhisper (OpenAI) [16]Wav2Vec2 (Meta) [17]Google STT [18]
Indian Accent SupportStrong (multilingual model trained on diverse accents) [19,20]Varies (depends on fine-tuned dataset) [20]Good (Google has extensive Indian English training data) [21]
Regional Variants (Hindi-English, Tamil-English, etc.)Handles code-switching well [22]Requires specific fine-tuning for mixed languages [23]Decent but struggles with heavy accents [18]
Noise RobustnessStrong (performs well in real-world noisy environments) [16]Moderate (depends on fine-tuned model) [17]Good (handles background noise effectively) [18]
Spoken Speed AdaptabilityGood (handles fast speech well) [22]Varies (pre-trained models sometimes struggle) [23]Good (adjusts well to fast-paced speech) [18]

First meaningful transcription time (in seconds)

AudioDurationInitial modelImproved model
audio 00138.1544.803.00
audio 00270.9779.535.05
audio 00380.6987.784.33
audio 00454.8662.194.35
audio 00533.2538.092.87
audio 00640.9358.666.10
audio 00748.1353.853.05
audio 00833.4938.682.73
audio 00933.9438.552.51
audio 01048.9554.283.40

Performance of existing AER systems for chemical term recognition

FeatureWhisper (OpenAI)Wav2Vec2 (Meta)Google STT
Chemical Terms RecognitionLimited (depends on general training data, not domain-specific) [16]Can be fine-tuned for better accuracy [17]Good (Google’s general corpus covers some scientific terms) [18]
Adaptability to Scientific JargonPoor without custom fine-tuning [19]Can be trained on specialized datasets [20]Better but not perfect [21]
Handling of Long & Complex TermsStruggles with rare chemical names [16]Can be improved with domain-specific training [17]Sometimes recognizes common scientific terms but struggles with rare ones [18]

Stress testing (hours)

AudioDurationInitial modelImproved model
long audio011.1441.2991.144
long audio023.0273.3633.029
Language: English
Submitted on: Feb 5, 2025
Published on: Apr 25, 2025
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Sonali Kothari, Shwetambari Chiwhane, Shreeja Mehta, Pranav Naranatt, Md. Asad Ansari, Rithwik Satya, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.