piper-plus-moe / README.md
ayousanz's picture
Update README.md
917592c verified
|
raw
history blame
2.06 kB
metadata
license: other
language:
  - ja

piper-plus-moe

Japanese Multi-Speaker Text-to-Speech Model based on VITS architecture, compatible with Piper TTS.

Model Overview

Item Value
Language Japanese (ja)
Phoneme Type OpenJTalk
Number of Speakers 20
Sample Rate 22050 Hz
Training Epochs 200
Architecture VITS
Framework piper-plus

Usage

With Piper TTS

# Download model files
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx.json

# Run inference with Piper
echo "こんにちは、私は日本語音声合成モデルです。" | piper \
  --model moe-speech-20speakers-200epochs.onnx \
  --output_file output.wav \
  --speaker 0

Training Command

NCCL_DEBUG=INFO NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 uv run python -m piper_train \
  --dataset-dir /data/piper/dataset-moe-speech-20speakers \
  --accelerator gpu --devices 4 --precision 16-mixed \
  --max_epochs 200 --batch-size 32 --samples-per-speaker 4 \
  --checkpoint-epochs 1 --quality medium \
  --base_lr 2e-4 --disable_auto_lr_scaling \
  --ema-decay 0.9995 --num-workers 0 --no-pin-memory \
  --default_root_dir /data/piper/output-moe-speech-20speakers-lr2e4-fixed

Dataset

  • Source: moe-speech dataset
  • Total Speakers: 20
  • Total Utterances: ~60,000
  • Total Duration: ~90 hours
  • Phoneme Type: OpenJTalk

Citation

@misc{piper-plus-moe,
  author = {ayousanz},
  title = {piper-plus-moe: Japanese Multi-Speaker TTS Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ayousanz/piper-plus-moe}
}

Acknowledgements

  • Piper TTS - Original TTS framework
  • VITS - Model architecture
  • moe-speech dataset contributors