Abstract
Whisperer is introduced as an intelligent, real-time prompting system that aims to improve the flow and naturalness of speaking in public and on camera. It is different from regular teleprompters because it does not just follow a script. Instead, it uses Google Cloud’s low-latency speech-to-text (STT) and text-to-speech (TTS) services to sync spoken content with a prepared script in real time. The system can handle synonyms, homophones, numeric variations, and spontaneous improvisations because it uses linguistic models such as CMUDict for phoneme-level alignment, FastText for semantic similarity, and BERT for contextual understanding. Whisperer also has adaptive TTS feedback that matches the speaker’s speed. This includes changes made in real time based on how long the speaker pauses and how fast they speak. Testing shows that the speakers’ fluency and consistency of delivery have both improved.