[Full 12-question Likert survey, plus forced-choice transcription test, available in supplementary materials.]
For years, text-to-speech was purely utilitarian. It was designed for screen readers, phone trees, and GPS systems. The goal was neutrality. The goal was to be invisible. wiseguy text to speech
A lightweight DSP block adds:
The drop in naturalness (4.5 → 3.9) aligns with findings in expressive TTS: caricature-level prosody reduces perceived "human likeness." However, for applications like video game NPCs or comedy dubbing, high authenticity outweighs perfect naturalness. [Full 12-question Likert survey
[Generated for Academic Purposes] Publication Date: April 14, 2026 Journal: Journal of Synthetic Media and Paralinguistics , Vol. 19, Issue 2 plus forced-choice transcription test
[Full 12-question Likert survey, plus forced-choice transcription test, available in supplementary materials.]
For years, text-to-speech was purely utilitarian. It was designed for screen readers, phone trees, and GPS systems. The goal was neutrality. The goal was to be invisible.
A lightweight DSP block adds:
The drop in naturalness (4.5 → 3.9) aligns with findings in expressive TTS: caricature-level prosody reduces perceived "human likeness." However, for applications like video game NPCs or comedy dubbing, high authenticity outweighs perfect naturalness.
[Generated for Academic Purposes] Publication Date: April 14, 2026 Journal: Journal of Synthetic Media and Paralinguistics , Vol. 19, Issue 2