Qwen2.5-1.5B-SQL-Assistant-Full (Merged)
π Model Overview
Qwen2.5-SQL-Assistant-Full is a standalone fine-tuned Language Model optimized for Text-to-SQL generation.
This model represents the merged version of the SQL-Assistant-Prod adapter.
The LoRA adapters have been permanently folded into the base model weights,
meaning this model can be loaded directly with transformers, vLLM, TGI, or converted to GGUF for local use (Ollama) without requiring PEFT dependencies.
Key Features
- Architecture: Qwen 2.5 (1.5 Billion Parameters).
- Specialization: Strictly generates SQL queries based on provided database schemas.
- Deployment: Ready for high-performance inference servers (vLLM, Groq, Together AI) as a standard model.
- Efficiency: Extremely lightweight (requires < 4GB VRAM in FP16), making it suitable for edge devices and CPU-only environments.
π» How to Use
Because this is a merged model, usage is standard and simple. You do not need peft.
Using Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# 1. Load the Model (Standard Loading)
model_id = "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16 # or float32 for CPU
)
# 2. Define Context & Question
schema = "CREATE TABLE employees (id INT, name VARCHAR, dept VARCHAR, salary INT)"
question = "Show me the top 3 earners in the Sales department."
# 3. Format Input (Chat Template)
messages = [
{"role": "system", "content": "You are a SQL expert."},
{"role": "user", "content": f"{schema}\nQuestion: {question}"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# 4. Generate
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=150)
# 5. Output
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip())
π Performance & Evaluation
The model was evaluated using Normalized Exact Match Accuracy against a hold-out test set from the b-mc2/sql-create-context dataset.
| Metric | Score | Notes |
|---|---|---|
| Exact Match | ~78% | High fidelity to schema constraints. |
| Hallucination | < 1% | Rarely invents columns not present in the CREATE TABLE context. |
| Format | 100% | Consistently outputs raw SQL without conversational filler. |
π οΈ Training Details
- Original Base Model:
Qwen/Qwen2.5-1.5B-Instruct - Fine-Tuning Method: QLoRA (Rank 16, Alpha 16).
- Merge Method:
merge_and_unload()via PEFT. - Precision: The merged weights are saved in standard precision (FP32/FP16), allowing for further quantization (e.g., AWQ, GPTQ, GGUF) if desired.
β οΈ Limitations & Bias
- Context Required: The model is optimized for Context-Dependent SQL generation. It relies on receiving a valid
CREATE TABLEstatement in the prompt to function correctly. - Read-Only Focus:* While it can generate
INSERT/UPDATEqueries, it is primarily optimized for data retrieval (SELECT). - Safety: Always validate and sanitize SQL queries generated by LLMs before executing them on production databases to prevent SQL injection risks.
π License
This project is licensed under the MIT License.
- Downloads last month
- 14