Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 3 days ago • 35
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 4 days ago • 45
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published 17 days ago • 20
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 23 days ago • 110
Forecasting Downstream Performance of LLMs With Proxy Metrics Paper • 2605.18607 • Published 25 days ago • 14
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published 22 days ago • 10
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published about 1 month ago • 61
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure Paper • 2604.11045 • Published Apr 13 • 26
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 96
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views Paper • 2603.27183 • Published Mar 28 • 20
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published Mar 13 • 149
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs Paper • 2602.00462 • Published Jan 31 • 21
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Paper • 2505.20046 • Published May 26, 2025 • 18
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces Paper • 2503.01894 • Published Feb 27, 2025 • 2
Societal Alignment Frameworks Can Improve LLM Alignment Paper • 2503.00069 • Published Feb 27, 2025 • 17
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published Feb 25, 2025 • 32
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3, 2025 • 40