Fix generation with latest transformers

#88

by kylesayrs - opened Feb 24, 2025

base: refs/heads/main

←

from: refs/pr/88

Discussion Files changed

-1

kylesayrs

Feb 24, 2025

•

edited Feb 24, 2025

Purpose

Fix model generation

Related Issues

Changes

The latest transformers release removed support for past_key_values.get_max_length() in favor of max_cache_length = past_key_values.get_max_cache_shape()

Testing

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

# Select model and load it.
MODEL_ID = "deepseek-ai/DeepSeek-V3"

config = AutoConfig.from_pretrained(MODEL_ID, trust_remote_code=True)
del config.quantization_config

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    config=config,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

# # Confirm generations of the quantized model look sane.
print("\n\n")
print("========== SAMPLE GENERATION ==============")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))
print("==========================================\n\n")

fix cache seq length calldb2fa7b2

kylesayrs changed pull request status to open Feb 24, 2025

xujfcn

6 days ago

If you're looking for an easy way to access this model via API, you can use Crazyrouter — it provides an OpenAI-compatible endpoint for 600+ models including this one. Just pip install openai and change the base URL.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment