Learn Hard Problems During RL with Reference Guided Fine-tuning Paper • 2603.01223 • Published 3 days ago • 12
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published 6 days ago • 34
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 5 days ago • 69
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 14 days ago • 16
"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing Paper • 2602.15569 • Published 15 days ago • 13
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 23 days ago • 256
Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation Paper • 2602.07298 • Published 26 days ago • 4
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 17 days ago • 52
Detecting RLVR Training Data via Structural Convergence of Reasoning Paper • 2602.11792 • Published 20 days ago • 2
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning Paper • 2602.11748 • Published 20 days ago • 30
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching Paper • 2602.12829 • Published 19 days ago • 4
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 22 days ago • 237
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing Paper • 2602.08741 • Published 23 days ago • 2
GoodVibe: Security-by-Vibe for LLM-Based Code Generation Paper • 2602.10778 • Published 21 days ago • 3