Kikuyu ASR with Extended Vocabulary (Fine-tuned from Pre-trained Adapters)

This model addresses CTC collapse for double vowels, fine-tuned from pre-trained per-language adapters.

Model Description

  • Architecture: Hybrid V3 (W2V-BERT 2.0 + MMS-style adapters + Stable Decoder)
  • Base Model: facebook/w2v-bert-2.0 (frozen)
  • Pre-trained from: mutisya/w2v-bert-per-language-6lang-25_50-v1
  • Vocabulary Extension: Added double-vowel tokens ['aa', 'ee', 'ii', 'oo', 'uu', '末末', '农农']

Training Details

  • Starting Point: Pre-trained Kikuyu adapters (WER: 21.37%)
  • Dataset: mutisya/Kikuyu_asr_v24_23_1-filtered
  • Training Samples: 30000
  • Fine-tuning Epochs: 10
  • Learning Rate: 0.0005
  • Final WER: 14.09%

Improvement

Metric Pre-trained After Extended Vocab Improvement
WER 21.37% 14.09% 7.28pp

Error Analysis

Category Count Percentage
Exact Match 183 36.6%
Double Vowel Error 79 15.8%
Word Boundary Error 22 4.4%
Other 172 34.4%
Downloads last month
7
Safetensors
Model size
0.6B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mutisya/w2v-bert-v3Hybrid-kik-extendVocab-v1.1

Finetuned
(2)
this model