piper-plus-moe / README.md
ayousanz's picture
Update README.md
917592c verified
|
raw
history blame
2.06 kB
---
license: other
language:
- ja
---
# piper-plus-moe
Japanese Multi-Speaker Text-to-Speech Model based on VITS architecture, compatible with [Piper TTS](https://github.com/rhasspy/piper).
## Model Overview
| Item | Value |
|------|-------|
| Language | Japanese (ja) |
| Phoneme Type | OpenJTalk |
| Number of Speakers | 20 |
| Sample Rate | 22050 Hz |
| Training Epochs | 200 |
| Architecture | VITS |
| Framework | [piper-plus](https://github.com/ayutaz/piper-plus) |
## Usage
### With Piper TTS
```bash
# Download model files
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx.json
# Run inference with Piper
echo "こんにけは、私はζ—₯本θͺžιŸ³ε£°εˆζˆγƒ’デルです。" | piper \
--model moe-speech-20speakers-200epochs.onnx \
--output_file output.wav \
--speaker 0
```
## Training Command
```bash
NCCL_DEBUG=INFO NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 uv run python -m piper_train \
--dataset-dir /data/piper/dataset-moe-speech-20speakers \
--accelerator gpu --devices 4 --precision 16-mixed \
--max_epochs 200 --batch-size 32 --samples-per-speaker 4 \
--checkpoint-epochs 1 --quality medium \
--base_lr 2e-4 --disable_auto_lr_scaling \
--ema-decay 0.9995 --num-workers 0 --no-pin-memory \
--default_root_dir /data/piper/output-moe-speech-20speakers-lr2e4-fixed
```
## Dataset
- **Source**: moe-speech dataset
- **Total Speakers**: 20
- **Total Utterances**: ~60,000
- **Total Duration**: ~90 hours
- **Phoneme Type**: OpenJTalk
## Citation
```bibtex
@misc{piper-plus-moe,
author = {ayousanz},
title = {piper-plus-moe: Japanese Multi-Speaker TTS Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/ayousanz/piper-plus-moe}
}
```
## Acknowledgements
- [Piper TTS](https://github.com/rhasspy/piper) - Original TTS framework
- [VITS](https://github.com/jaywalnut310/vits) - Model architecture
- moe-speech dataset contributors