🎙️ GibbsTTS — Zero-Shot Voice Cloning TTS
Official interactive demo for Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech.
- Paper: https://arxiv.org/abs/2605.09386
- Code: https://github.com/ydqmkkx/GibbsTTS
- Weights: https://huggingface.co/ydqmkkx/GibbsTTS
Upload a short reference speech audio (a few seconds is enough).
The reference transcript is optional. Leave it blank, choose ASR language, (and click the Auto-transcribe button,) Whisper will transcribe automatically.
Then type the text you want to synthesize, choose reference and TTS languages, and click the Synthesize button, the model
will speak it in the reference voice.
Supports English, Chinese Mandarin, English/Chinese mixing, and Japanese (LoRA fine-tuned).
Also supports cross-lingual synthesis.
16 64
1 5
0 1
0.1 1
0.5 1
0.5 2
Examples
| Reference audio (prompt) | Reference transcript (optional) | Target text (what you want the model to speak) | ASR language | Reference language | TTS language |
|---|