This model always predicts some few nonsense sequences

#1
by CharlesChen2023 - opened

I am encountering an issue with a quantized version of the [Model Name] model. The model frequently generates nonsense sequences (e.g., 人事, 出生 etc.) these two words should be '(.*?).
img_v3_02vc_320011c2-ec27-4870-a91d-75cbf6ee8f3g

Intel org

Thank you for the information. Would you mind sharing the serving command and the evaluation prompts , which we can use to evaluate model quality when producing a new quantized version?
The issue has been tracked here. https://github.com/intel/auto-round/issues/1480

/root/miniconda3/envs/vllm-glm-int4/bin/python -m vllm.entrypoints.openai.api_server
--model $MODEL_ID
--served-model-name claude-opus-4-6
--port 80
--trust-remote-code
--max-model-len 202752
--tensor-parallel-size 8
--gpu-memory-utilization 0.85
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--max-num-seqs 16

nonsense characters, should be in/output
image

Intel org

Could you share some text inputs to reproduce this issue

It is difficult to reproduce, I use it in the claude code. But the cases like that often show up.

Sign up or log in to comment