This model always predicts some few nonsense sequences

by CharlesChen2023 - opened 5 days ago

CharlesChen2023

•

I am encountering an issue with a quantized version of the [Model Name] model. The model frequently generates nonsense sequences (e.g., 人事, 出生 etc.) these two words should be '(.*?).

wenhuach

Intel org 4 days ago

Thank you for the information. Would you mind sharing the serving command and the evaluation prompts , which we can use to evaluate model quality when producing a new quantized version?
The issue has been tracked here. https://github.com/intel/auto-round/issues/1480

CharlesChen2023

4 days ago

/root/miniconda3/envs/vllm-glm-int4/bin/python -m vllm.entrypoints.openai.api_server
--model $MODEL_ID
--served-model-name claude-opus-4-6
--port 80
--trust-remote-code
--max-model-len 202752
--tensor-parallel-size 8
--gpu-memory-utilization 0.85
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--max-num-seqs 16

nonsense characters, should be in/output

wenhuach

Intel org 2 days ago

Could you share some text inputs to reproduce this issue

CharlesChen2023

2 days ago

It is difficult to reproduce, I use it in the claude code. But the cases like that often show up.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment