Model release mismatches the paper?

#3
by rishi-via - opened

The model's size etc seem significantly smaller to the paper's claim of using a Gemma 2 9B model as the LLM. Here's Claude's summary. Am I misunderstanding something?
Screenshot 2026-05-09 at 12.49.46 PM

BharatGen AI org

Hi rishi-via,
Thanks for pointing this out. The paper is referenced because it introduced the SMEAR-MoE projector design, an architectural approach we independently implemented in our own system. The paper and our model share this architectural pattern, but nothing else: the datasets, the encoder, the LLM, and the training setup are entirely different.
We understand this created confusion and appreciate you raising it. We will update the model card to make this distinction clearer.

AVJ18 changed discussion status to closed

Sign up or log in to comment