NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published 25 days ago • 43
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published 24 days ago • 49
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published 25 days ago • 13
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 16 days ago • 89