W2V-BERT 2.0 ASR Adapters
This repository contains per-language bottleneck adapters for automatic speech recognition (ASR) trained on top of facebook/w2v-bert-2.0.
Model Description
- Base Model: facebook/w2v-bert-2.0 (600M parameters, frozen)
- Adapter Architecture: Bottleneck adapters (Pfeiffer-style, dim=64)
- Decoder: Lightweight transformer decoder (2 layers)
- Training: CTC loss with extended vocabulary for double vowels
Trained Adapters
Training in progress...
| Adapter | Language | WER | Train Samples |
|---|
Successful: 14/14
โ Good (WER < 30%): 9 swh_Latn_v2: 4.49% swh_Latn_salt: 13.60% kik_Latn: 16.41% luo_Latn: 16.58% swh_Latn_v1: 17.50% eng_Latn_tts: 22.24% eng_Latn_salt: 24.80% lug_Latn_salt: 27.48% ach_Latn: 29.28%
โก Medium (30-60% WER): 2 kam_Latn: 31.20% mer_Latn: 36.34%
โ ๏ธ Poor (60-90% WER): 1 nyn_Latn: 65.37%
โ Collapsed (WER >= 90%): 2 ful_Latn: 100.00% teo_Latn: 99.87%
Architecture
The model uses:
- Frozen w2v-bert-2.0 encoder - Extracts audio representations
- Bottleneck adapters - Language-specific adaptation (trainable)
- Lightweight decoder - Transformer decoder blocks (trainable)
- LM head - Per-language vocabulary projection (trainable)
Usage
Each adapter folder contains:
adapter_weights.pt- Bottleneck adapter weightsdecoder_weights.pt- Decoder block weightslm_head_weights.pt- Language model head weightsfinal_norm_weights.pt- Final layer norm weightsvocab.json- Language-specific vocabularyadapter_config.json- Adapter configurationmetrics.json- Training metrics
Training Configuration
- Epochs: 10
- Base Learning Rate: 0.0005 (adaptive based on dataset size)
- Batch Size: 48 x 1
- Extended Vocabulary: True
- Adapter Dimension: 64
License
Apache 2.0
Model tree for mutisya/w2v-bert-adapters-14lang-e10-25_52-v8
Base model
facebook/w2v-bert-2.0