Automatic Speech Recognition
Transformers
PyTorch
Safetensors
Russian
whisper
Eval Results (legacy)

Whisper Large V3 Russian Podlodka

This repository contains a fine-tuned Whisper Large V3 model for Russian speech recognition. It serves as the core transcription component of the Pisets system, specifically optimized for long audio recordings such as lectures and interviews.

The model was presented in the paper Pisets: A Robust Speech Recognition System for Lectures and Interviews.

System Architecture

The Pisets system implements a three-component architecture to improve recognition accuracy while minimizing hallucinations:

  1. Wav2Vec2: For primary recognition and segmentation.
  2. Audio Spectrogram Transformer (AST): For filtering non-speech segments.
  3. Whisper (this model): For the final high-quality transcription.

Implementation

The complete source code and instructions for using the system (including generation of SRT and DocX files) can be found in the GitHub repository:

GitHub: https://github.com/bond005/pisets

Citation

If you use this model or the Pisets system in your research, please cite:

@article{bondarenko2026pisets,
  title={Pisets: A Robust Speech Recognition System for Lectures and Interviews},
  author={Ivan Bondarenko},
  journal={arXiv preprint arXiv:2601.18415},
  year={2026}
}
Downloads last month
725
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bond005/whisper-large-v3-ru-podlodka

Merges
2 models

Datasets used to train bond005/whisper-large-v3-ru-podlodka

Space using bond005/whisper-large-v3-ru-podlodka 1

Paper for bond005/whisper-large-v3-ru-podlodka

Evaluation results

  • WER (with punctuation and capital letters) on Podlodka.io
    self-reported
    20.910
  • WER (without punctuation) on Podlodka.io
    self-reported
    10.987
  • WER (without punctuation) on Russian Librispeech
    self-reported
    9.795