Model release mismatches the paper?

by rishi-via - opened about 1 month ago

The model's size etc seem significantly smaller to the paper's claim of using a Gemma 2 9B model as the LLM. Here's Claude's summary. Am I misunderstanding something?

AVJ18

BharatGen AI org 10 days ago

Hi rishi-via,
Thanks for pointing this out. The paper is referenced because it introduced the SMEAR-MoE projector design, an architectural approach we independently implemented in our own system. The paper and our model share this architectural pattern, but nothing else: the datasets, the encoder, the LLM, and the training setup are entirely different.
We understand this created confusion and appreciate you raising it. We will update the model card to make this distinction clearer.

AVJ18 changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment