SmolLM-135M-GEC-SFT-DPO

A style-preserving grammar correction model based on SmolLM-135M, trained with SFT + DPO to make minimal, targeted corrections while preserving your original writing style.

Why This Model?

Unlike large language models (GPT, Claude, etc.) that tend to rewrite entire sentences, this model makes minimal, targeted corrections - fixing only grammatical errors while preserving your vocabulary, tone, and voice. Perfect for:

  • Legal documents: Maintain precise legal terminology
  • Academic writing: Preserve scholarly tone
  • ESL/EFL education: Help learners without changing their ideas
  • Professional communications: Keep your authentic voice

Key Features

  • Minimal corrections: Fixes only grammatical errors, doesn't rewrite your sentences
  • Style preservation: Maintains your vocabulary, tone, and voice
  • Small & efficient: Only 135M parameters (~500MB) - runs on CPU!
  • BLEU score: ~0.50 on grammar correction benchmarks

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")
tokenizer = AutoTokenizer.from_pretrained("DanJZY/SmolLM-135M-GEC-SFT-DPO")

text = "As the number of people grows, the need of habitable environment is essential."
inputs = tokenizer(f"Fix grammar: {text}", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example: Style-Preserving vs Over-Correction

Original (with error):
"As the number of people grows, the need of habitable environment is essential."

✅ Our Model (Style-Preserving):
"As the number of people grows, the need for a habitable environment is essential."
                                         ↑
                            Only fixes "of" → "for a"

❌ Typical Model (Over-Correction):
"As population growth continues, the necessity for a habitable environment becomes essential."
                            ↑
    Completely rewrites: changes vocabulary, structure, and tone

Training Details

Parameter Value
Base model SmolLM-135M
Training method SFT + DPO (Direct Preference Optimization)
Preference pairs ~19,000 (generated using edit distance)
Total experiments 28 (22 SFT + 6 DPO/IPO)
Hardware 8x RTX 3090
Training time ~3 hours

Resources

Resource Link
GitHub Repository ZhuoyuanJiang/SmolLM-GEC-SFT-DPO
Full Experiment Checkpoints Google Drive (~68GB)
Training Notebooks GitHub notebooks/

Intended Use

  • Grammar correction for English text
  • Writing assistance that preserves author's voice
  • Educational tools for language learners
  • Proofreading applications

Limitations

  • English only
  • Best for sentence-level corrections
  • Not designed for stylistic improvements (only grammar)

Citation

@misc{smollm_gec_sft_dpo_2025,
  title={SmolLM-135M-GEC-SFT-DPO: Style-Preserving Grammar Correction with Direct Preference Optimization},
  author={Zhuoyuan Jiang},
  year={2025},
  url={https://huggingface.co/DanJZY/SmolLM-135M-GEC-SFT-DPO},
  note={Fine-tuned SmolLM-135M for minimal, style-preserving grammatical error correction}
}

Acknowledgments

Special thanks to Nima Tajbakhsh (Nvidia) for guidance on efficient training methods.

Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DanJZY/SmolLM-135M-GEC-SFT-DPO

Finetuned
(90)
this model

Dataset used to train DanJZY/SmolLM-135M-GEC-SFT-DPO

Evaluation results