Instructions to use saillab/medgemma-4b-full-lora-mimic-mt-12k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use saillab/medgemma-4b-full-lora-mimic-mt-12k with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-4b-it") model = PeftModel.from_pretrained(base_model, "saillab/medgemma-4b-full-lora-mimic-mt-12k") - Notebooks
- Google Colab
- Kaggle
MedGemma-4B Full LoRA (Layers 0-33) β Multi-task n=12K
LoRA adapter for google/medgemma-4b-it,
released as part of "Mechanistically Guided LoRA Improves Paraphrase
Consistency in Medical Vision-Language Models" (Sadanadan & Behzadan,
CHIL 2026).
This is the full arm of the paper: rank-16 adapters applied to all 34 layers of the language model. It serves as the high-capacity contrast point to the targeted-layer (L15-19) adapter, isolating the question of whether mechanistically motivated layer selection matters versus distributing adaptation across the whole stack.
This release corresponds to the multi-task n=12K scale-up of the n=500 binary checkpoint reported in the submitted CHIL paper. It uses a sequence-level cross-entropy + symmetric KL loss compatible with all MIMIC-CXR question types.
Training
| Setting | Value |
|---|---|
| Base model | google/medgemma-4b-it |
Adapter rank (r) |
16 |
alpha |
32 |
| Dropout | 0.05 |
| Learning rate | 2e-4 |
| Effective batch size | 8 (batch 1, grad-accum 8) |
| Epochs | 3 |
| Target layers | 0-33 (all) |
| Target modules | Q, K, V, O attention projections + gate, up, down MLP projections |
| Training data | MIMIC-CXR train split, all question types, ~2,865 unique questions Γ 3 epochs of random paraphrase sampling β 8,600 paraphrase pairs |
| Loss | Sequence-level cross-entropy on first answer token + symmetric KL divergence between paraphrase predictions |
| Trainable parameters | 29.8M (0.69% of base) |
Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
import torch
base = AutoModelForImageTextToText.from_pretrained(
"google/medgemma-4b-it",
dtype=torch.bfloat16,
device_map="cuda",
)
model = PeftModel.from_pretrained(base, "saillab/medgemma-4b-full-lora-mimic-mt-12k")
processor = AutoProcessor.from_pretrained("saillab/medgemma-4b-full-lora-mimic-mt-12k")
Intended use
Research on medical-VLM paraphrase robustness and LoRA-based fine-tuning. Not for clinical use. The CHIL paper documents that this fully fine-tuned variant achieves the lowest flip rate but at the cost of higher text-only agreement β the model relies more on language priors than image evidence relative to the targeted-layer variant.
Citation (primary β CHIL 2026)
@inproceedings{sadanadan2026mechanistic,
title = {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
booktitle = {Conference on Health, Inference, and Learning (CHIL)},
year = {2026}
}
Companion evaluation work
@misc{sadanadan2026heatmap,
title = {Attention Without Grounding: Causal Evaluation of Visual Explanations in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
year = {2026},
note = {Pre-print, SAIL Lab, University of New Haven}
}
License
Distributed under the Gemma Terms of Use, inheriting the licensing terms of the base model.
- Downloads last month
- -