W2V-BERT 2.0 ASR Adapters
This repository contains per-language bottleneck adapters for automatic speech recognition (ASR) trained on top of facebook/w2v-bert-2.0.
Model Description
- Base Model: facebook/w2v-bert-2.0 (600M parameters, frozen)
- Adapter Architecture: Bottleneck adapters (Pfeiffer-style, dim=64)
- Decoder: Lightweight transformer decoder (2 layers)
- Training: CTC loss with extended vocabulary for double vowels
Trained Adapters
Training in progress...
| Adapter | Language | WER | Train Samples |
|---|
Successful: 14/14
โ
Good (WER < 30%): 9
swh_Latn_v2: 4.00%
swh_Latn_salt: 13.25%
kik_Latn: 16.49%
luo_Latn: 16.50%
swh_Latn_v1: 17.34%
eng_Latn_tts: 21.85%
eng_Latn_salt: 24.58%
lug_Latn_salt: 28.02%
ach_Latn: 28.62%
โก Medium (30-60% WER): 3
kam_Latn: 30.66%
mer_Latn: 36.49%
teo_Latn: 58.12%
โ ๏ธ Poor (60-90% WER): 1
nyn_Latn: 64.39%
โ Collapsed (WER >= 90%): 1
ful_Latn: 99.98%
Architecture
The model uses:
- Frozen w2v-bert-2.0 encoder - Extracts audio representations
- Bottleneck adapters - Language-specific adaptation (trainable)
- Lightweight decoder - Transformer decoder blocks (trainable)
- LM head - Per-language vocabulary projection (trainable)
Usage
Each adapter folder contains:
adapter_weights.pt- Bottleneck adapter weightsdecoder_weights.pt- Decoder block weightslm_head_weights.pt- Language model head weightsfinal_norm_weights.pt- Final layer norm weightsvocab.json- Language-specific vocabularyadapter_config.json- Adapter configurationmetrics.json- Training metrics
Training Configuration
- Epochs: 10
- Base Learning Rate: 0.0005 (adaptive based on dataset size)
- Batch Size: 48 x 1
- Extended Vocabulary: True
- Adapter Dimension: 64
License
Apache 2.0
Model tree for mutisya/w2v-bert-adapters-14lang-e10-25_52-v10
Base model
facebook/w2v-bert-2.0