ayousanz
/

piper-plus-moe

Model card Files Files and versions

Metrics Training metrics

piper-plus-moe / README.md

ayousanz's picture

Update README.md

917592c verified 2 months ago

|

2.06 kB

	---
	license: other
	language:
	- ja
	---
	# piper-plus-moe

	Japanese Multi-Speaker Text-to-Speech Model based on VITS architecture, compatible with [Piper TTS](https://github.com/rhasspy/piper).

	## Model Overview

	\| Item \| Value \|
	\|------\|-------\|
	\| Language \| Japanese (ja) \|
	\| Phoneme Type \| OpenJTalk \|
	\| Number of Speakers \| 20 \|
	\| Sample Rate \| 22050 Hz \|
	\| Training Epochs \| 200 \|
	\| Architecture \| VITS \|
	\| Framework \| [piper-plus](https://github.com/ayutaz/piper-plus) \|

	## Usage

	### With Piper TTS

	```bash
	# Download model files
	wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx
	wget https://huggingface.co/ayousanz/piper-plus-moe/resolve/main/moe-speech-20speakers-200epochs.onnx.json

	# Run inference with Piper
	echo "こんにちは、私は日本語音声合成モデルです。" \| piper \
	--model moe-speech-20speakers-200epochs.onnx \
	--output_file output.wav \
	--speaker 0
	```

	## Training Command

	```bash
	NCCL_DEBUG=INFO NCCL_P2P_DISABLE=1 NCCL_IB_DISABLE=1 uv run python -m piper_train \
	--dataset-dir /data/piper/dataset-moe-speech-20speakers \
	--accelerator gpu --devices 4 --precision 16-mixed \
	--max_epochs 200 --batch-size 32 --samples-per-speaker 4 \
	--checkpoint-epochs 1 --quality medium \
	--base_lr 2e-4 --disable_auto_lr_scaling \
	--ema-decay 0.9995 --num-workers 0 --no-pin-memory \
	--default_root_dir /data/piper/output-moe-speech-20speakers-lr2e4-fixed
	```

	## Dataset

	- Source: moe-speech dataset
	- Total Speakers: 20
	- Total Utterances: ~60,000
	- Total Duration: ~90 hours
	- Phoneme Type: OpenJTalk

	## Citation

	```bibtex
	@misc{piper-plus-moe,
	author = {ayousanz},
	title = {piper-plus-moe: Japanese Multi-Speaker TTS Model},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/ayousanz/piper-plus-moe}
	}
	```

	## Acknowledgements

	- [Piper TTS](https://github.com/rhasspy/piper) - Original TTS framework
	- [VITS](https://github.com/jaywalnut310/vits) - Model architecture
	- moe-speech dataset contributors