Whisper Small Urdu v2 ๐ŸŽ™๏ธ

This model is a fine-tuned version of khawajaaliarshad/whisper-small-urdu optimized for Urdu speech-to-text. It was trained as part of a research initiative to improve ASR performance for low-resource linguistic environments.

Model Results

The model demonstrates strong phonetic accuracy, particularly in handling the complex morphology of the Urdu language.

Metric Value
Word Error Rate (WER) 35.44%
Character Error Rate (CER) 12.05%
Final Validation Loss 0.6692

Intended Uses & Limitations

Intended Use

  • Transcription of Urdu voice recordings.
  • Accessibility tools for Urdu speakers.
  • Foundation for downstream Urdu NLP tasks (e.g., sentiment analysis of speech).

Limitations

  • Background Noise: Performance may degrade in noisy environments or with multiple speakers.
  • Dialects: Primarily optimized for standard Urdu; regional accents may vary in accuracy.
  • Dataset Size: Trained on a subset of Common Voice (1,500 samples), so very niche vocabulary might be missed.

Training Procedure

Training Hyperparameters

  • Learning Rate: 5e-06 (Gentle fine-tuning to preserve base weights)
  • Batch Size: 8 (Per device)
  • Effective Batch Size: 32 (via Gradient Accumulation)
  • Steps: 300
  • Mixed Precision: FP16
  • Optimizer: AdamW

Training Progress

Step Training Loss Validation Loss
100 1.6249 1.0378
200 0.2065 0.6495
300 0.0993 0.6692

Note: Training was concluded at 300 steps as the Validation Loss began to plateau, indicating optimal convergence and preventing overfitting.

Framework Versions

  • Transformers: 5.0.0
  • Pytorch: 2.10.0+cu128
  • Datasets: 4.8.3
  • Tokenizers: 0.22.2

Developed by: Hamza Amin
Location: Ghulam Ishaq Khan Institute (GIKI), Pakistan.

Downloads last month
57
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hamza-amin/whisper-small-urdu-v2

Finetuned
(2)
this model

Evaluation results