🎙️ GibbsTTS — Zero-Shot Voice Cloning TTS

Official interactive demo for Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech.

Upload a short reference speech audio (a few seconds is enough).
The reference transcript is optional. Leave it blank, choose ASR language, (and click the Auto-transcribe button,) Whisper will transcribe automatically.
Then type the text you want to synthesize, choose reference and TTS languages, and click the Synthesize button, the model will speak it in the reference voice.
Supports English, Chinese Mandarin, English/Chinese mixing, and Japanese (LoRA fine-tuned).
Also supports cross-lingual synthesis.

ASR language

Language hint for Whisper. Choose None to use auto-detection.

Reference language

Language of the reference audio/transcript.

TTS language

Language used by GibbsTTS for synthesis.

16 64
1 5
0 1
0.1 1
0.5 1
0.5 2
Examples
Reference audio (prompt) Reference transcript (optional) Target text (what you want the model to speak) ASR language Reference language TTS language