LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published Mar 29 • 148
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 20
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 8
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 8 • 3
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 8
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 105
Running Agents Featured 857 Qwen3 Demo 📊 857 Chat with an AI assistant that thinks before answering
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12, 2025 • 46
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16, 2025 • 73
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published Jul 24, 2025 • 86