piper-plus-moe / README.md

ayousanz

Update README.md

917592c verified 2 months ago

2.06 kB

license: other
language:
  - ja

piper-plus-moe

Japanese Multi-Speaker Text-to-Speech Model based on VITS architecture, compatible with Piper TTS.

Model Overview

Item	Value
Language	Japanese (ja)
Phoneme Type	OpenJTalk
Number of Speakers	20
Sample Rate	22050 Hz
Training Epochs	200
Architecture	VITS
Framework	piper-plus

Usage

With Piper TTS

# Download model files
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx
wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx.json

# Run inference with Piper
echo "こんにちは、私は日本語音声合成モデルです。" | piper \
  --model moe-speech-20speakers-200epochs.onnx \
  --output_file output.wav \
  --speaker 0

Training Command

NCCL_DEBUG=INFO NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 uv run python -m piper_train \
  --dataset-dir /data/piper/dataset-moe-speech-20speakers \
  --accelerator gpu --devices 4 --precision 16-mixed \
  --max_epochs 200 --batch-size 32 --samples-per-speaker 4 \
  --checkpoint-epochs 1 --quality medium \
  --base_lr 2e-4 --disable_auto_lr_scaling \
  --ema-decay 0.9995 --num-workers 0 --no-pin-memory \
  --default_root_dir /data/piper/output-moe-speech-20speakers-lr2e4-fixed

Dataset

Source: moe-speech dataset
Total Speakers: 20
Total Utterances: ~60,000
Total Duration: ~90 hours
Phoneme Type: OpenJTalk

Citation

@misc{piper-plus-moe,
  author = {ayousanz},
  title = {piper-plus-moe: Japanese Multi-Speaker TTS Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ayousanz/piper-plus-moe}
}

Acknowledgements

Piper TTS - Original TTS framework
VITS - Model architecture
moe-speech dataset contributors