Whisper Small Urdu v2 ๐๏ธ
This model is a fine-tuned version of khawajaaliarshad/whisper-small-urdu optimized for Urdu speech-to-text. It was trained as part of a research initiative to improve ASR performance for low-resource linguistic environments.
Model Results
The model demonstrates strong phonetic accuracy, particularly in handling the complex morphology of the Urdu language.
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 35.44% |
| Character Error Rate (CER) | 12.05% |
| Final Validation Loss | 0.6692 |
Intended Uses & Limitations
Intended Use
- Transcription of Urdu voice recordings.
- Accessibility tools for Urdu speakers.
- Foundation for downstream Urdu NLP tasks (e.g., sentiment analysis of speech).
Limitations
- Background Noise: Performance may degrade in noisy environments or with multiple speakers.
- Dialects: Primarily optimized for standard Urdu; regional accents may vary in accuracy.
- Dataset Size: Trained on a subset of Common Voice (1,500 samples), so very niche vocabulary might be missed.
Training Procedure
Training Hyperparameters
- Learning Rate: 5e-06 (Gentle fine-tuning to preserve base weights)
- Batch Size: 8 (Per device)
- Effective Batch Size: 32 (via Gradient Accumulation)
- Steps: 300
- Mixed Precision: FP16
- Optimizer: AdamW
Training Progress
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 1.6249 | 1.0378 |
| 200 | 0.2065 | 0.6495 |
| 300 | 0.0993 | 0.6692 |
Note: Training was concluded at 300 steps as the Validation Loss began to plateau, indicating optimal convergence and preventing overfitting.
Framework Versions
- Transformers: 5.0.0
- Pytorch: 2.10.0+cu128
- Datasets: 4.8.3
- Tokenizers: 0.22.2
Developed by: Hamza Amin
Location: Ghulam Ishaq Khan Institute (GIKI), Pakistan.
- Downloads last month
- 57
Model tree for hamza-amin/whisper-small-urdu-v2
Base model
openai/whisper-small Finetuned
khawajaaliarshad/whisper-small-urduEvaluation results
- Test WER on Common Voice 16.1test set self-reported35.440
- Test CER on Common Voice 16.1test set self-reported12.050