SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 10 days ago • 59
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 10 days ago • 59
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22, 2025 • 20
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published Jan 24, 2025 • 27