AcademiClaw: When Students Set Challenges for AI Agents Paper • 2605.02661 • Published 8 days ago • 15
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 9 days ago • 151
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published 5 days ago • 11
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 5 days ago • 13
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 5 days ago • 36
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published 14 days ago • 29
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 15 days ago • 116
Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL Paper • 2604.17073 • Published 24 days ago • 9
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Paper • 2604.18584 • Published 22 days ago • 15
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents Paper • 2604.17308 • Published 23 days ago • 22
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 22 days ago • 84
PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research Paper • 2604.15411 • Published 26 days ago • 4
The Amazing Agent Race: Strong Tool Users, Weak Navigators Paper • 2604.10261 • Published 25 days ago • 7
Toward Autonomous Long-Horizon Engineering for ML Research Paper • 2604.13018 • Published 28 days ago • 34
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation Paper • 2604.08570 • Published Mar 25 • 125