Whisper Small Urdu v2 🎙️

This model is a fine-tuned version of khawajaaliarshad/whisper-small-urdu optimized for Urdu speech-to-text. It was trained as part of a research initiative to improve ASR performance for low-resource linguistic environments.

Model Results

The model demonstrates strong phonetic accuracy, particularly in handling the complex morphology of the Urdu language.

Metric	Value
Word Error Rate (WER)	35.44%
Character Error Rate (CER)	12.05%
Final Validation Loss	0.6692

Intended Uses & Limitations

Intended Use

Transcription of Urdu voice recordings.
Accessibility tools for Urdu speakers.
Foundation for downstream Urdu NLP tasks (e.g., sentiment analysis of speech).

Limitations

Background Noise: Performance may degrade in noisy environments or with multiple speakers.
Dialects: Primarily optimized for standard Urdu; regional accents may vary in accuracy.
Dataset Size: Trained on a subset of Common Voice (1,500 samples), so very niche vocabulary might be missed.

Training Procedure

Training Hyperparameters

Learning Rate: 5e-06 (Gentle fine-tuning to preserve base weights)
Batch Size: 8 (Per device)
Effective Batch Size: 32 (via Gradient Accumulation)
Steps: 300
Mixed Precision: FP16
Optimizer: AdamW

Training Progress

Step	Training Loss	Validation Loss
100	1.6249	1.0378
200	0.2065	0.6495
300	0.0993	0.6692

Note: Training was concluded at 300 steps as the Validation Loss began to plateau, indicating optimal convergence and preventing overfitting.

Framework Versions

Transformers: 5.0.0
Pytorch: 2.10.0+cu128
Datasets: 4.8.3
Tokenizers: 0.22.2

Developed by: Hamza Amin
Location: Ghulam Ishaq Khan Institute (GIKI), Pakistan.

Downloads last month: 57

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for hamza-amin/whisper-small-urdu-v2

Base model

openai/whisper-small

Finetuned

khawajaaliarshad/whisper-small-urdu

Finetuned

(2)

this model

Evaluation results

Test WER on Common Voice 16.1
test set self-reported

35.440
Test CER on Common Voice 16.1
test set self-reported

12.050