vosk-model-ja-medical
Japanese medical speech recognition model based on vosk-model-ja-0.22, fine-tuned with the DMiME (Dictionary of Medical terms in MEdical informatics) Japanese medical dictionary (~42,000 medical terms). Training data was created by synthesizing speech with Azure Speech Service and Google Cloud TTS from medical terms extracted from DMiME.
Overview
This model extends the standard Vosk Japanese model with medical vocabulary including disease names, drug names, clinical procedures, and anatomical terms. The language model was adapted using SRILM interpolation with morphologically-segmented medical text corpus.
Performance
Evaluated on 500 TTS-synthesized medical utterances (Azure Speech ja-JP-NanamiNeural):
| Model | CER | Change |
|---|---|---|
| vosk-model-ja-0.22 (baseline) | 14.17% | — |
| vosk-model-ja-medical | 12.60% | -1.57% |
- Improved: 137 / Same: 226 / Degraded: 137 (out of 500)
- Best improvements on short medical terms (≤10 chars): -2.0% CER
Example improvements
| Reference | Baseline | Medical model |
|---|---|---|
| スルトプリド塩酸塩の経過を観察する | する と プリ 人 塩酸 塩 の 経過 を 観察 する | スルトプリド 塩酸 塩 の 経過 を 観察 する |
| 下腿挫滅創 | 硬い 挫滅 そう | 下腿挫滅創 |
| 会陰部裂傷縫合不全 | 遠因 武烈 性 縫合 不全 | 会陰 部 裂傷 縫合 不全 |
| 肺エキノコックス症 | 廃液 の コックス 生 | 肺 エキノコックス 症 |
| アクリノール消毒液です | 悪意 の 居る 消毒 液 です | アクリノール 消毒 液 です |
Usage
from vosk import Model, KaldiRecognizer
import wave
import json
model = Model("vosk-model-ja-medical")
wf = wave.open("audio.wav", "rb")
rec = KaldiRecognizer(model, wf.getframerate())
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
rec.AcceptWaveform(data)
result = json.loads(rec.FinalResult())
print(result["text"])
Model details
- Base model: vosk-model-ja-0.22 (Alphacephei)
- Acoustic model: Unchanged (Kaldi nnet3/chain TDNN)
- Language model: SRILM 4-gram, Witten-Bell discounting, interpolated with base LM (lambda=0.95)
- Vocabulary: ~350,000 words (base 309K + medical 42K, morphologically segmented)
- Medical dictionary: DMiME v1.1 (42,467 entries)
- Morphological analysis: fugashi (MeCab + UniDic-lite) for compound term segmentation
- Medical collocation boost: 5,758 medical suffix patterns (性, 症, 炎, 腫, 癌, 病) repeated for LM emphasis
Adaptation method
- DMiME terms parsed and morphologically segmented using fugashi
- New lexicon entries generated with hiragana-to-Kaldi phone mapping
- Medical text corpus (197K sentences) generated from templates with space-separated morphemes
- SRILM: Witten-Bell 4-gram trained on medical corpus, interpolated with base LM (lambda=0.95)
- Kaldi: prepare_lang.sh + mkgraph.sh to rebuild HCLG.fst
- Rescore with full interpolated LM (G.carpa, 1.1GB)
Limitations
- Homophones: Words like 性/製/勢 (all "sei") may be confused in non-medical contexts
- Long compound terms: Very long medical terms (>10 chars) may still be split into common words
- Evaluation: Tested on TTS-synthesized speech only; real clinical speech may differ
- GPL2 license: Modifications and redistributions must remain under GPL2
License
This model is licensed under GPL-2.0 (GNU General Public License ver. 2), inheriting from the DMiME dictionary license.
- Base Vosk model: Apache 2.0 (compatible with GPL2 redistribution)
- DMiME dictionary: GPL2 (Copyright (C) 2016 Kmm, Project DMiME)
- Combined model: GPL-2.0
DMiME is built on top of ORCA's kana-kanji conversion medical dictionary (also GPL2), developed by the Japan Medical Association Research Institute.
Citation
If you use this model, please cite:
@misc{vosk-model-ja-medical,
title={vosk-model-ja-medical: Japanese Medical Speech Recognition Model},
author={kenrouse},
year={2026},
url={https://huggingface.co/kenrouse/vosk-model-ja-medical}
}
Acknowledgments
- Alphacephei / Vosk — Base model and compile package
- DMiME (Dictionary of Medical terms in MEdical informatics) — Japanese medical dictionary
- Nickolay Shmyrev — Technical guidance and SRILM pointer