Actual context length ?

#20

by TomSchelsen - opened 22 days ago

22 days ago

README says 256k but config.json has "max_position_embeddings": 393216, so e.g. vLLM infers that as the max length. This is not a small difference.

juliendenize

Mistral AI_ org 21 days ago

Hi ! Thanks for bringing this up.

max_position_embeddings is definetly correct here as this is the one you can compute from the yarn config (scaling factor * original max pos embedding).

I've started investigating the issue as we also pass in the config max_seq_len = 256k that is the value we recommend and you found in the model card. However vLLM seems to enforce 393k based on the computation of the yarn config and not the value specified by max_position_embeddings nor max_seq_len. I'll push the investigations to know if it is desired or a bug and submit a PR/discussion with the vLLM team tomorrow morning.

juliendenize

Mistral AI_ org 20 days ago

•

edited 20 days ago

Hey:

So after discussing with the vLLM team I understood that I was misunderstanding the scope of max_seq_len inside the codebase. They have the same semantics so having two different values makes little sense.

What's good is that this misconfiguration has 0 effect on the performance of the model. But this has an impact on memory allocation and is not what we desired.

I'm discussing with VLLM team the best approach to make a default value (if possible) and will update the params.json / README.md soon accordingly. In the meantime, you can pass to the serving command:
--max-model-len 262144.

Edit: I updated the README and params.json, please make sure to pass --max-model-len 262144

juliendenize changed discussion status to closed 20 days ago

TomSchelsen

20 days ago

All clear, thanks for the instruction and for investigating this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment