SmolLM-135M-GEC-SFT-DPO
A style-preserving grammar correction model based on SmolLM-135M, trained with SFT + DPO to make minimal, targeted corrections while preserving your original writing style.
Why This Model?
Unlike large language models (GPT, Claude, etc.) that tend to rewrite entire sentences, this model makes minimal, targeted corrections - fixing only grammatical errors while preserving your vocabulary, tone, and voice. Perfect for:
- Legal documents: Maintain precise legal terminology
- Academic writing: Preserve scholarly tone
- ESL/EFL education: Help learners without changing their ideas
- Professional communications: Keep your authentic voice
Key Features
- Minimal corrections: Fixes only grammatical errors, doesn't rewrite your sentences
- Style preservation: Maintains your vocabulary, tone, and voice
- Small & efficient: Only 135M parameters (~500MB) - runs on CPU!
- BLEU score: ~0.50 on grammar correction benchmarks
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")
tokenizer = AutoTokenizer.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")
text = "As the number of people grows, the need of habitable environment is essential."
inputs = tokenizer(f"Fix grammar: {text}", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example: Style-Preserving vs Over-Correction
Original (with error):
"As the number of people grows, the need of habitable environment is essential."
✅ Our Model (Style-Preserving):
"As the number of people grows, the need for a habitable environment is essential."
↑
Only fixes "of" → "for a"
❌ Typical Model (Over-Correction):
"As population growth continues, the necessity for a habitable environment becomes essential."
↑
Completely rewrites: changes vocabulary, structure, and tone
Training Details
| Parameter | Value |
|---|---|
| Base model | SmolLM-135M |
| Training method | SFT + DPO (Direct Preference Optimization) |
| Preference pairs | ~19,000 (generated using edit distance) |
| Total experiments | 28 (22 SFT + 6 DPO/IPO) |
| Hardware | 8x RTX 3090 |
| Training time | ~3 hours |
Resources
| Resource | Link |
|---|---|
| GitHub Repository | ZhuoyuanJiang/SmolLM-GEC-SFT-DPO |
| Full Experiment Checkpoints | Google Drive (~68GB) |
| Training Notebooks | GitHub notebooks/ |
Intended Use
- Grammar correction for English text
- Writing assistance that preserves author's voice
- Educational tools for language learners
- Proofreading applications
Limitations
- English only
- Best for sentence-level corrections
- Not designed for stylistic improvements (only grammar)
Citation
@misc{smollm_gec_sft_dpo_2025,
title={SmolLM-135M-GEC-SFT-DPO: Style-Preserving Grammar Correction with Direct Preference Optimization},
author={Zhuoyuan Jiang},
year={2025},
url={https://huggingface.co/DanJZY/SmolLM-135M-GEC-SFT-DPO},
note={Fine-tuned SmolLM-135M for minimal, style-preserving grammatical error correction}
}
Acknowledgments
Special thanks to Nima Tajbakhsh (Nvidia) for guidance on efficient training methods.
- Downloads last month
- 1
Model tree for DanJZY/SmolLM-135M-GEC-SFT-DPO
Base model
HuggingFaceTB/SmolLM-135MDataset used to train DanJZY/SmolLM-135M-GEC-SFT-DPO
Evaluation results
- BLEUself-reported0.500