Parler-TTS 🗣️

ParlerTTS is a training and inference library for high-quality text-to-speech (TTS) models. This demonstration highlights the flexibility of the IndicParlerTTS model, which generates natural, expressive speech for over 22 Indian languages, using a simple text prompt to control features like speaker style, tone, pitch, pace, and more.

Tips for effective usage:

  • Use detailed captions to describe the speaker and desired characteristics (e.g., "Aditi speaks in a slightly expressive tone, with clear audio quality and a moderate pace.").
  • For best results, reference specific named speakers provided in the model card on the model page.
  • Include terms like "very clear audio" or "slightly noisy audio" to control the audio quality and background ambiance.
  • Punctuation can be used to shape prosody (e.g., commas add pauses for natural phrasing).
  • If unsure about what caption to use, you can start with: "The speaker speaks naturally. The recording is very high quality with no background noise."

Examples
Input Text Description
If you'd like to learn more about how the model was trained or explore fine-tuning it yourself, visit the Parler-TTS repository on GitHub. The Parler-TTS codebase and associated checkpoints are licensed under the Apache 2.0 license.