W2V-BERT 2.0 ASR Adapters

This repository contains per-language bottleneck adapters for automatic speech recognition (ASR) trained on top of facebook/w2v-bert-2.0.

Model Description

Base Model: facebook/w2v-bert-2.0 (600M parameters, frozen)
Adapter Architecture: Bottleneck adapters (Pfeiffer-style, dim=64)
Decoder: Lightweight transformer decoder (2 layers)
Training: CTC loss with extended vocabulary for double vowels

Trained Adapters

Training in progress...

Adapter	Language	WER	Train Samples

Successful: 14/14

✅ Good (WER < 30%): 9 swh_Latn_v2: 4.49% swh_Latn_salt: 13.60% kik_Latn: 16.41% luo_Latn: 16.58% swh_Latn_v1: 17.50% eng_Latn_tts: 22.24% eng_Latn_salt: 24.80% lug_Latn_salt: 27.48% ach_Latn: 29.28%

⚡ Medium (30-60% WER): 2 kam_Latn: 31.20% mer_Latn: 36.34%

⚠️ Poor (60-90% WER): 1 nyn_Latn: 65.37%

❌ Collapsed (WER >= 90%): 2 ful_Latn: 100.00% teo_Latn: 99.87%

Architecture

The model uses:

Frozen w2v-bert-2.0 encoder - Extracts audio representations
Bottleneck adapters - Language-specific adaptation (trainable)
Lightweight decoder - Transformer decoder blocks (trainable)
LM head - Per-language vocabulary projection (trainable)

Usage

Each adapter folder contains:

adapter_weights.pt - Bottleneck adapter weights
decoder_weights.pt - Decoder block weights
lm_head_weights.pt - Language model head weights
final_norm_weights.pt - Final layer norm weights
vocab.json - Language-specific vocabulary
adapter_config.json - Adapter configuration
metrics.json - Training metrics

Training Configuration

Epochs: 10
Base Learning Rate: 0.0005 (adaptive based on dataset size)
Batch Size: 48 x 1
Extended Vocabulary: True
Adapter Dimension: 64

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mutisya/w2v-bert-adapters-14lang-e10-25_52-v8

Base model

facebook/w2v-bert-2.0

Finetuned

(417)

this model