Model release mismatches the paper?
#3
by rishi-via - opened
Hi rishi-via,
Thanks for pointing this out. The paper is referenced because it introduced the SMEAR-MoE projector design, an architectural approach we independently implemented in our own system. The paper and our model share this architectural pattern, but nothing else: the datasets, the encoder, the LLM, and the training setup are entirely different.
We understand this created confusion and appreciate you raising it. We will update the model card to make this distinction clearer.
AVJ18 changed discussion status to closed
