Kikuyu ASR with Extended Vocabulary (Fine-tuned from Pre-trained Adapters)
This model addresses CTC collapse for double vowels, fine-tuned from pre-trained per-language adapters.
Model Description
- Architecture: Hybrid V3 (W2V-BERT 2.0 + MMS-style adapters + Stable Decoder)
- Base Model: facebook/w2v-bert-2.0 (frozen)
- Pre-trained from: mutisya/w2v-bert-per-language-6lang-25_50-v1
- Vocabulary Extension: Added double-vowel tokens ['aa', 'ee', 'ii', 'oo', 'uu', '末末', '农农']
Training Details
- Starting Point: Pre-trained Kikuyu adapters (WER: 21.37%)
- Dataset: mutisya/Kikuyu_asr_v24_23_1-filtered
- Training Samples: 30000
- Fine-tuning Epochs: 10
- Learning Rate: 0.0005
- Final WER: 14.09%
Improvement
| Metric | Pre-trained | After Extended Vocab | Improvement |
|---|---|---|---|
| WER | 21.37% | 14.09% | 7.28pp |
Error Analysis
| Category | Count | Percentage |
|---|---|---|
| Exact Match | 183 | 36.6% |
| Double Vowel Error | 79 | 15.8% |
| Word Boundary Error | 22 | 4.4% |
| Other | 172 | 34.4% |
- Downloads last month
- 7
Model tree for mutisya/w2v-bert-v3Hybrid-kik-extendVocab-v1.1
Base model
mutisya/w2v-bert-per-language-6lang-25_50-v1