mlx-community/Irodori-TTS-600M-v3-VoiceDesign-8bit

This model was converted to MLX format from Aratako/Irodori-TTS-600M-v3-VoiceDesign using mlx-audio version 0.4.3.

Quantized to 8-bit (group_size=64). For full precision see mlx-community/Irodori-TTS-600M-v3-VoiceDesign-fp16.

Refer to the original model card for more details on the model.

The v3 VoiceDesign variant supports dual conditioning: a text caption describing the desired voice style, and optionally a reference audio clip. Both can be provided simultaneously.

Use with mlx-audio

pip install -U mlx-audio

Caption only (no reference audio)

python -m mlx_audio.tts.generate \
  --model mlx-community/Irodori-TTS-600M-v3-VoiceDesign-8bit \
  --text "ใ“ใ‚“ใซใกใฏใ€ใƒ†ใ‚นใƒˆใงใ™ใ€‚" \
  --instruct "็ฉใ‚„ใ‹ใง่ฝใก็€ใ„ใŸๅฅณๆ€งใฎๅฃฐใ€‚ใ‚†ใฃใใ‚Šใจ่ฉฑใ™ใ€‚"

Reference audio + caption (dual mode)

python -m mlx_audio.tts.generate \
  --model mlx-community/Irodori-TTS-600M-v3-VoiceDesign-8bit \
  --text "ใ“ใ‚“ใซใกใฏใ€ใƒ†ใ‚นใƒˆใงใ™ใ€‚" \
  --ref_audio path/to/reference.wav \
  --instruct "็ฉใ‚„ใ‹ใง่ฝใก็€ใ„ใŸ่ฉฑใ—ๆ–นใ€‚"

Python Example

from mlx_audio.tts.utils import load

model = load("mlx-community/Irodori-TTS-600M-v3-VoiceDesign-8bit")

# Caption only
results = list(model.generate(
    "ใ“ใ‚“ใซใกใฏใ€ใƒ†ใ‚นใƒˆใงใ™ใ€‚",
    caption="็ฉใ‚„ใ‹ใง่ฝใก็€ใ„ใŸๅฅณๆ€งใฎๅฃฐใ€‚",
))

# Dual mode (ref audio + caption)
import mlx.core as mx
ref_audio = mx.array(...)  # shape (1, num_samples)
results = list(model.generate(
    "ใ“ใ‚“ใซใกใฏใ€ใƒ†ใ‚นใƒˆใงใ™ใ€‚",
    ref_audio=ref_audio,
    caption="็ฉใ‚„ใ‹ใง่ฝใก็€ใ„ใŸ่ฉฑใ—ๆ–นใ€‚",
))
Downloads last month
106
Safetensors
Model size
0.3B params
Tensor type
U32
ยท
F16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support