Qwen2.5-1.5B-SQL-Assistant-Full (Merged)

📖 Model Overview

Qwen2.5-SQL-Assistant-Full is a standalone fine-tuned Language Model optimized for Text-to-SQL generation.

This model represents the merged version of the SQL-Assistant-Prod adapter. The LoRA adapters have been permanently folded into the base model weights, meaning this model can be loaded directly with transformers, vLLM, TGI, or converted to GGUF for local use (Ollama) without requiring PEFT dependencies.

Key Features

Architecture: Qwen 2.5 (1.5 Billion Parameters).
Specialization: Strictly generates SQL queries based on provided database schemas.
Deployment: Ready for high-performance inference servers (vLLM, Groq, Together AI) as a standard model.
Efficiency: Extremely lightweight (requires < 4GB VRAM in FP16), making it suitable for edge devices and CPU-only environments.

💻 How to Use

Because this is a merged model, usage is standard and simple. You do not need peft.

Using Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 1. Load the Model (Standard Loading)
model_id = "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 # or float32 for CPU
)

# 2. Define Context & Question
schema = "CREATE TABLE employees (id INT, name VARCHAR, dept VARCHAR, salary INT)"
question = "Show me the top 3 earners in the Sales department."

# 3. Format Input (Chat Template)
messages = [
    {"role": "system", "content": "You are a SQL expert."},
    {"role": "user", "content": f"{schema}\nQuestion: {question}"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 4. Generate
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=150)

# 5. Output
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip())

📊 Performance & Evaluation

The model was evaluated using Normalized Exact Match Accuracy against a hold-out test set from the b-mc2/sql-create-context dataset.

Metric	Score	Notes
Exact Match	~78%	High fidelity to schema constraints.
Hallucination	< 1%	Rarely invents columns not present in the CREATE TABLE context.
Format	100%	Consistently outputs raw SQL without conversational filler.

🛠️ Training Details

Original Base Model: Qwen/Qwen2.5-1.5B-Instruct
Fine-Tuning Method: QLoRA (Rank 16, Alpha 16).
Merge Method: merge_and_unload() via PEFT.
Precision: The merged weights are saved in standard precision (FP32/FP16), allowing for further quantization (e.g., AWQ, GPTQ, GGUF) if desired.

⚠️ Limitations & Bias

Context Required: The model is optimized for Context-Dependent SQL generation. It relies on receiving a valid CREATE TABLE statement in the prompt to function correctly.
Read-Only Focus:* While it can generate INSERT/UPDATE queries, it is primarily optimized for data retrieval (SELECT).
Safety: Always validate and sanitize SQL queries generated by LLMs before executing them on production databases to prevent SQL injection risks.

📜 License

This project is licensed under the MIT License.

Downloads last month: 14

Safetensors

Model size

2B params

Tensor type

F16

Model tree for manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct