Instructions to use openai/whisper-large-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v3") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3") - Inference
- Notebooks
- Google Colab
- Kaggle
Does anyone know how to tag the speaker with Whisper?
I tried the model for interview record, and it worked pretty well. The thing was that the output was a whole chunk of text and I have no idea about how to tag different speakers. I assume Whisper can distinguish different voices. Are there any easy ways to do that?
Hello. As far as I can see you need this? https://huggingface.co/learn/audio-course/chapter7/transcribe-meeting
You can also have a look at WhisperX: [https://github.com/m-bain/whisperX]
But no, "speaker diarization" (distinguishing speakers) is NOT a feature of the model Whisper, as it was not trained for this task.
BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni
BTW, I managed to tag the speakers for primary research interview record using the code here: https://colab.research.google.com/drive/1V-Bt5Hm2kjaDb4P1RyMSswsDKyrzc2-3?usp=sharing#scrollTo=ACobbJnIR_ni
speaker diarization is not possible through this model (any whisper model) you are using pyannote, that is a different thing. Also, you need to agree to their terms (or complete a form) before you can use it.