ewre324
/

ewre324-R1-SmolLM2-135M-Distill

Model card Files Files and versions

Metrics Training metrics Community

Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

https://github.com/ewre324/open-r1/tree/main

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewre324/ewre324-R1-SmolLM2-135M-Distill

Base model

HuggingFaceTB/SmolLM2-135M

Finetuned

ewre324/ewre324-Thinker-SmolLM2-135M-Instruct-Reasoning

Finetuned

(1)

this model

Dataset used to train ewre324/ewre324-R1-SmolLM2-135M-Distill

Collection including ewre324/ewre324-R1-SmolLM2-135M-Distill

R1 Distill

Collection of Distills using Open R1 • 2 items • Updated Feb 1, 2025