Whisperer: A Real-Time Prompting System with Multilayered Semantic Matching and Adaptive Speech Synthesis

Gülbahar Elmas; Şeymanur Karaçalı; Ebrar Yiğit; Fatma Patlar Akbulut

doi:10.2478/acss-2025-0016

.blurhash-client-img { display: none !important; }

Whisperer: A Real-Time Prompting System with Multilayered Semantic Matching and Adaptive Speech Synthesis

Applied Computer Systems

Volume 30 (2025): Issue 1 (January 2025)

By: Gülbahar Elmas , Şeymanur Karaçalı , Ebrar Yiğit and Fatma Patlar Akbulut

Open Access

|Nov 2025

Abstract

Whisperer is introduced as an intelligent, real-time prompting system that aims to improve the flow and naturalness of speaking in public and on camera. It is different from regular teleprompters because it does not just follow a script. Instead, it uses Google Cloud’s low-latency speech-to-text (STT) and text-to-speech (TTS) services to sync spoken content with a prepared script in real time. The system can handle synonyms, homophones, numeric variations, and spontaneous improvisations because it uses linguistic models such as CMUDict for phoneme-level alignment, FastText for semantic similarity, and BERT for contextual understanding. Whisperer also has adaptive TTS feedback that matches the speaker’s speed. This includes changes made in real time based on how long the speaker pauses and how fast they speak. Testing shows that the speakers’ fluency and consistency of delivery have both improved.

References

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese BERT-networks,” arXiv preprint arXiv:1908.10084, Aug. 2019. https://doi.org/10.48550/arXiv.1908.10084
Search in Google Scholar Back to article
M. Westera, J. Amidei, and L. Mayol, “Similarity or deeper understanding? Analyzing the TED-Q dataset of evoked questions,” in Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, Dec. 2020, pp. 5004–5012. https://doi.org/10.18653/v1/2020.coling-main.439
Search in Google Scholar Back to article
A. W. Qurashi, V. Holmes, and A. P. Johnson, “Document processing: Methods for semantic text similarity analysis,” in 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Novi Sad, Serbia, Aug. 2020, pp. 1–6. https://doi.org/10.1109/INISTA49547.2020.9194665
Search in Google Scholar Back to article
A. Rana, A. Pant, N. Rawat, P. Rawat, S. Vats, and V. Sharma, “Semantic similarity analysis using FastText,” in 2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC), Gwalior, India, Jul. 2024, pp. 454–460. https://doi.org/10.1109/AIC61668.2024.10731025
Search in Google Scholar Back to article
D. R. Yerramreddy, J. Marasani, P. Gowtham, and G. Harshit, “Speech recognition paradigms: A comparative evaluation of SpeechBrain, Whisper and Wav2Vec2 models,” in 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, Apr. 2024, pp. 1–6. https://doi.org/10.1109/I2CT61223.2024.10544133
Search in Google Scholar Back to article
E. Loda, “Riconoscimento del parlato mediante openai Whisper,” unpublished report, 2024.
Search in Google Scholar Back to article
S. Wang, C.-H. Yang, J. Wu, and C. Zhang, “Can Whisper perform speech-based in-context learning?” in ICASSP 2024–IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Apr. 2024, pp. 13421–13425. https://doi.org/10.1109/ICASSP48485.2024.10446502
Search in Google Scholar Back to article
R. Jain, A. Barcovschi, M. Yiwere, P. Corcoran, and H. Cucu, “Adaptation of Whisper models to child speech recognition,” arXiv preprint arXiv:2307.13008, Jul. 2023. https://doi.org/10.48550/arXiv.2307.13008
Search in Google Scholar Back to article
Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “FastSpeech: Fast, robust and controllable text to speech,” in Advances in Neural Information Processing Systems, vol. 32, 2019. https://proceedings.neurips.cc/paper_files/paper/2019/file/f63f65b503e22cb970527f23c9ad7db1-Paper.pdf
Search in Google Scholar Back to article
Y. Ren, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “Almost unsupervised text to speech and automatic speech recognition,” in International Conference on Machine Learning, 2019, pp. 5410–5419. https://proceedings.mlr.press/v97/ren19a/ren19a.pdf
Search in Google Scholar Back to article
R. Asadi, H. Trinh, H. J. Fell, and T. W. Bickmore, “IntelliPrompter: Speech-based dynamic note display interface for oral presentations,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction, Nov. 2017, pp. 172–180. https://doi.org/10.1145/3136755.3136818
Search in Google Scholar Back to article
L. Pandey and A. S. M. N. Arif, “Effects of speaking rate on speech and silent speech recognition,” in CHI Conference on Human Factors in Computing Systems Extended Abstracts, Apr. 2022, pp. 1–8. https://doi.org/10.1145/3491101.3519611
Search in Google Scholar Back to article
N. Rossenbach, A. Zeyer, R. Schlüter, and H. Ney, “Generating synthetic audio data for attention-based speech recognition systems,” in ICASSP 2020 – IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, May 2020, pp. 7069–7073. https://doi.org/10.1109/ICASSP40776.2020.9053008
Search in Google Scholar Back to article
J. Alabi, D. I. Adelani, D. Ruiter, and C. C. Emezue, “The effect of curated vs. noisy data on cross-lingual transferability for low- resource languages,” in Findings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 2741–2752.
Search in Google Scholar Back to article
R. Sadigov, E. Yıldırım, B. Kocaçınar, F. Patlar Akbulut, and C. Catal, “Deep learning-based user experience evaluation in distance learning,” Cluster Computing, vol. 27, pp. 443–455, Feb. 2024. https://doi.org/10.1007/s10586-022-03918-3
Search in Google Scholar Back to article
G. A. Miller, “The magical number seven, plus or minus two: Some limits on our capacity for processing information,” Psychological Review, vol. 63, no. 2, pp. 81–97, 1956. https://doi.org/10.1037/h0043158
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/acss-2025-0016 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683

Journal RSS Feed

Language: English

Page range: 147 - 156

Submitted on: Jun 21, 2025

Accepted on: Nov 3, 2025

Published on: Nov 26, 2025

Published by: Riga Technical University

In partnership with: Paradigm Publishing Services

Publication frequency: Volume open

Keywords:

Phonetic confusion handling,

real-time speech synchronization,

semantic similarity evaluation,

transformer-based NLP,

word-level and sentence-level matching

Related subjects:

Computer sciences,

Artificial intelligence,

Information technology,

Project management,

Software development

© 2025 Gülbahar Elmas, Şeymanur Karaçalı, Ebrar Yiğit, Fatma Patlar Akbulut, published by Riga Technical University
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 License.

Volume 30 (2025): Issue 1 (January 2025)