Running 15 The Jagged AI Frontier is a Data Frontier 🧭 15 Why AI capabilities are shaped by data availability
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning Jun 11, 2024 • 21
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models 23 days ago • 104
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 63
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published Sep 2, 2025 • 13
Running on CPU Upgrade Featured 2.81k The Smol Training Playbook 📚 2.81k The secrets to building world-class LLMs
Running 75 Unlocking On-Policy Distillation for Any Model Family 📝 75 Apply on-policy distillation to any model family
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 501