| | --- |
| | license: other |
| | language: |
| | - ja |
| | --- |
| | # piper-plus-moe |
| |
|
| | Japanese Multi-Speaker Text-to-Speech Model based on VITS architecture, compatible with [Piper TTS](https://github.com/rhasspy/piper). |
| |
|
| | ## Model Overview |
| |
|
| | | Item | Value | |
| | |------|-------| |
| | | Language | Japanese (ja) | |
| | | Phoneme Type | OpenJTalk | |
| | | Number of Speakers | 20 | |
| | | Sample Rate | 22050 Hz | |
| | | Training Epochs | 200 | |
| | | Architecture | VITS | |
| | | Framework | [piper-plus](https://github.com/ayutaz/piper-plus) | |
| |
|
| | ## Usage |
| |
|
| | ### With Piper TTS |
| |
|
| | ```bash |
| | # Download model files |
| | wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx |
| | wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx.json |
| | |
| | # Run inference with Piper |
| | echo "γγγ«γ‘γ―γη§γ―ζ₯ζ¬θͺι³ε£°εζγ’γγ«γ§γγ" | piper \ |
| | --model moe-speech-20speakers-200epochs.onnx \ |
| | --output_file output.wav \ |
| | --speaker 0 |
| | ``` |
| |
|
| | ## Training Command |
| |
|
| | ```bash |
| | NCCL_DEBUG=INFO NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 uv run python -m piper_train \ |
| | --dataset-dir /data/piper/dataset-moe-speech-20speakers \ |
| | --accelerator gpu --devices 4 --precision 16-mixed \ |
| | --max_epochs 200 --batch-size 32 --samples-per-speaker 4 \ |
| | --checkpoint-epochs 1 --quality medium \ |
| | --base_lr 2e-4 --disable_auto_lr_scaling \ |
| | --ema-decay 0.9995 --num-workers 0 --no-pin-memory \ |
| | --default_root_dir /data/piper/output-moe-speech-20speakers-lr2e4-fixed |
| | ``` |
| |
|
| | ## Dataset |
| |
|
| | - **Source**: moe-speech dataset |
| | - **Total Speakers**: 20 |
| | - **Total Utterances**: ~60,000 |
| | - **Total Duration**: ~90 hours |
| | - **Phoneme Type**: OpenJTalk |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{piper-plus-moe, |
| | author = {ayousanz}, |
| | title = {piper-plus-moe: Japanese Multi-Speaker TTS Model}, |
| | year = {2025}, |
| | publisher = {HuggingFace}, |
| | url = {https://huggingface.co/ayousanz/piper-plus-moe} |
| | } |
| | ``` |
| |
|
| | ## Acknowledgements |
| |
|
| | - [Piper TTS](https://github.com/rhasspy/piper) - Original TTS framework |
| | - [VITS](https://github.com/jaywalnut310/vits) - Model architecture |
| | - moe-speech dataset contributors |