YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Igbo Tone & Diacritic Restoration (ByT5-small)

Automatically restores Igbo diacritics — tone marks (àáā) and subdot vowels (ịọụ) — from plain text. Built on ByT5-small (byte-level seq2seq), fine-tuned on a rebalanced dataset of 46,057 Igbo sentences.

Part of the Igbo Speech Project — this model serves as a preprocessor for TTS and a post-processor for ASR.

Key Results

Metric	Accuracy
Tone mark accuracy	61.6%
Subdot accuracy (ị, ọ, ụ)	88.2%
Overall diacritic accuracy	78.7%
Word exact match	34.3%

Why This Matters

Most Igbo text online lacks diacritics. We measured the diacritic gap across three sources:

Source	Tone marking rate
Well-toned corpus	96% of vowels
IgboAPI dictionary	46% of vowels
African Voices (crowd-sourced)	14% of vowels

78% of African Voices transcripts have zero tone marks. Without automatic restoration, TTS systems receive ambiguous input and ASR output lacks proper orthography.

Model Details

Property	Value
Base model	google/byt5-small
Architecture	ByT5 (byte-level T5, encoder-decoder)
Parameters	300M
Task	Seq2seq: plain text → fully diacriticized text
Training data	46,057 sentences (rebalanced: 56% toned corpus, 28% IgboAPI, 16% Bible)
Training time	~32 hours on Apple M4 MPS (est. ~48 min on H100)
Inference	num_beams=4, max_length=512

Training Data Composition (v2 — rebalanced)

Source	Sentences	Tone density	Weight
Well-toned corpus (4× oversample)	26,800	96%	56%
IgboAPI dictionary (3× oversample, normalized)	12,900	46%	28%
Igbo Bible (capped at 8K)	6,400	~0% (subdots only)	16%

Key insight: v1 used 76% Bible data (no tones) and achieved only 48.4% tone accuracy. Rebalancing to 84% toned data in v2 improved tone accuracy to 61.6% (+13 pp). Data composition > model size.

Usage

from transformers import AutoTokenizer, T5ForConditionalGeneration
import torch

model_dir = "path/to/tone_model/best"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = T5ForConditionalGeneration.from_pretrained(model_dir)
model.eval()

text = "Kedu ka i mere"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4, early_stopping=True)
restored = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(restored)  # "Kèdù kà í mèrè"

Long Text

from igbo_tts.tone_model.predict import ToneRestorer

restorer = ToneRestorer(model_dir="path/to/tone_model/best")
text = restorer.restore_long("Igbo bu asusu ndi Igbo. Anyi na-asu ya kwa ubochi.")

Pipeline Role

                    ┌─────────────────────┐
User text (untoned) │   Tone Model        │ Toned text
"Kedu ka i mere"  ──►  (this model)     ──► "Kèdù kà í mèrè"
                    └─────────────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         TTS input    ASR output    Keyboard
        (strip tones  (add tones    autocorrect
         keep subdots) to plain)

For TTS: Restores subdots (ụ, ọ) which are essential for pronunciation. Tone marks are stripped before synthesis (F5-TTS trained on untoned text).

For ASR: Post-processes untoned ASR output into proper Igbo orthography.

For keyboards: Real-time diacritization as users type plain Igbo text.

API

Available via api/server.py:

# Single text
curl -X POST http://localhost:8000/diacriticize \
  -H "Content-Type: application/json" \
  -d '{"text": "Kedu ka i mere"}'

# Batch mode
curl -X POST http://localhost:8000/diacriticize \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Kedu ka i mere", "Igbo bu asusu anyi"]}'

Response: {input, output, tone_ratio_before, tone_ratio_after}

Weights

Model weights are not hosted on this repository. See the GitHub repo for access instructions.

Checkpoint files:

model.safetensors (1.1 GB) — ByT5-small fine-tuned weights
config.json, tokenizer_config.json, special_tokens_map.json, added_tokens.json — model configs
generation_config.json — beam search settings

Known Limitations

Tone accuracy plateaued at 61.6% — limited by training data quality and convention conflicts
Single-word ambiguity: words like akwa (cry/cloth/egg/bed) require sentence context for correct tone
ṅ (dot-above) accuracy ~0%: too rare in training data (6 test instances)
Convention conflict: training mixes full marking (corpus) and contrastive marking (IgboAPI)

License

This model is released under CC-BY-NC-SA 4.0.

BY: You must give appropriate credit
NC: Non-commercial use only
SA: Derivatives must use the same license

Note: The base model (ByT5-small) is Apache 2.0 and the training data (African Voices) is CC-BY-4.0, so this model could use a more permissive license. We use CC-BY-NC-SA for consistency across the Igbo Speech Project models.

Citation

@misc{chimezie2026igbotone,
  title={Igbo Tone and Diacritic Restoration with ByT5},
  author={Chimezie, Emmanuel},
  year={2026},
  url={https://github.com/chimezie90/igbotts}
}

Author

Emmanuel Chimezie — Mexkoy Labs

Downloads last month: -; Downloads are not tracked for this model. How to track