From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 282
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24, 2025 • 90
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations Paper • 2510.22373 • Published Oct 25, 2025 • 14
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 75
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28, 2025 • 97
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting Paper • 2505.19716 • Published May 26, 2025 • 4
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation Paper • 2508.14104 • Published Aug 17, 2025 • 1
RobustFlow: Towards Robust Agentic Workflow Generation Paper • 2509.21834 • Published Sep 26, 2025 • 2
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering Paper • 2510.10828 • Published Oct 12, 2025 • 1
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27, 2025 • 121
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10, 2025 • 16
Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search Paper • 2502.17248 • Published Feb 24, 2025 • 1
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28, 2025 • 82