YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Igbo Tone & Diacritic Restoration (ByT5-small)

Automatically restores Igbo diacritics β€” tone marks (àÑā) and subdot vowels (ịọα»₯) β€” from plain text. Built on ByT5-small (byte-level seq2seq), fine-tuned on a rebalanced dataset of 46,057 Igbo sentences.

Part of the Igbo Speech Project β€” this model serves as a preprocessor for TTS and a post-processor for ASR.

Key Results

Metric Accuracy
Tone mark accuracy 61.6%
Subdot accuracy (α»‹, ọ, α»₯) 88.2%
Overall diacritic accuracy 78.7%
Word exact match 34.3%

Why This Matters

Most Igbo text online lacks diacritics. We measured the diacritic gap across three sources:

Source Tone marking rate
Well-toned corpus 96% of vowels
IgboAPI dictionary 46% of vowels
African Voices (crowd-sourced) 14% of vowels

78% of African Voices transcripts have zero tone marks. Without automatic restoration, TTS systems receive ambiguous input and ASR output lacks proper orthography.

Model Details

Property Value
Base model google/byt5-small
Architecture ByT5 (byte-level T5, encoder-decoder)
Parameters 300M
Task Seq2seq: plain text β†’ fully diacriticized text
Training data 46,057 sentences (rebalanced: 56% toned corpus, 28% IgboAPI, 16% Bible)
Training time ~32 hours on Apple M4 MPS (est. ~48 min on H100)
Inference num_beams=4, max_length=512

Training Data Composition (v2 β€” rebalanced)

Source Sentences Tone density Weight
Well-toned corpus (4Γ— oversample) 26,800 96% 56%
IgboAPI dictionary (3Γ— oversample, normalized) 12,900 46% 28%
Igbo Bible (capped at 8K) 6,400 ~0% (subdots only) 16%

Key insight: v1 used 76% Bible data (no tones) and achieved only 48.4% tone accuracy. Rebalancing to 84% toned data in v2 improved tone accuracy to 61.6% (+13 pp). Data composition > model size.

Usage

from transformers import AutoTokenizer, T5ForConditionalGeneration
import torch

model_dir = "path/to/tone_model/best"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = T5ForConditionalGeneration.from_pretrained(model_dir)
model.eval()

text = "Kedu ka i mere"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4, early_stopping=True)
restored = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(restored)  # "Kèdù kà í mèrè"

Long Text

from igbo_tts.tone_model.predict import ToneRestorer

restorer = ToneRestorer(model_dir="path/to/tone_model/best")
text = restorer.restore_long("Igbo bu asusu ndi Igbo. Anyi na-asu ya kwa ubochi.")

Pipeline Role

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
User text (untoned) β”‚   Tone Model        β”‚ Toned text
"Kedu ka i mere"  ──►  (this model)     ──► "KΓ¨dΓΉ kΓ  Γ­ mΓ¨rΓ¨"
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό            β–Ό            β–Ό
         TTS input    ASR output    Keyboard
        (strip tones  (add tones    autocorrect
         keep subdots) to plain)

For TTS: Restores subdots (α»₯, ọ) which are essential for pronunciation. Tone marks are stripped before synthesis (F5-TTS trained on untoned text).

For ASR: Post-processes untoned ASR output into proper Igbo orthography.

For keyboards: Real-time diacritization as users type plain Igbo text.

API

Available via api/server.py:

# Single text
curl -X POST http://localhost:8000/diacriticize \
  -H "Content-Type: application/json" \
  -d '{"text": "Kedu ka i mere"}'

# Batch mode
curl -X POST http://localhost:8000/diacriticize \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Kedu ka i mere", "Igbo bu asusu anyi"]}'

Response: {input, output, tone_ratio_before, tone_ratio_after}

Weights

Model weights are not hosted on this repository. See the GitHub repo for access instructions.

Checkpoint files:

  • model.safetensors (1.1 GB) β€” ByT5-small fine-tuned weights
  • config.json, tokenizer_config.json, special_tokens_map.json, added_tokens.json β€” model configs
  • generation_config.json β€” beam search settings

Known Limitations

  • Tone accuracy plateaued at 61.6% β€” limited by training data quality and convention conflicts
  • Single-word ambiguity: words like akwa (cry/cloth/egg/bed) require sentence context for correct tone
  • αΉ… (dot-above) accuracy ~0%: too rare in training data (6 test instances)
  • Convention conflict: training mixes full marking (corpus) and contrastive marking (IgboAPI)

License

This model is released under CC-BY-NC-SA 4.0.

  • BY: You must give appropriate credit
  • NC: Non-commercial use only
  • SA: Derivatives must use the same license

Note: The base model (ByT5-small) is Apache 2.0 and the training data (African Voices) is CC-BY-4.0, so this model could use a more permissive license. We use CC-BY-NC-SA for consistency across the Igbo Speech Project models.

Citation

@misc{chimezie2026igbotone,
  title={Igbo Tone and Diacritic Restoration with ByT5},
  author={Chimezie, Emmanuel},
  year={2026},
  url={https://github.com/chimezie90/igbotts}
}

Author

Emmanuel Chimezie β€” Mexkoy Labs

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support