NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 4 days ago • 25
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 9 days ago • 50
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 1 day ago • 76
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Paper • 2512.24766 • Published 8 days ago • 7
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 13 days ago • 26
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 21 days ago • 59
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published 27 days ago • 21
Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published 28 days ago • 17
SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization Paper • 2512.02631 • Published Dec 2, 2025 • 8
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published Dec 4, 2025 • 18
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published Dec 4, 2025 • 24
ProPhy: Progressive Physical Alignment for Dynamic World Simulation Paper • 2512.05564 • Published Dec 5, 2025 • 5
COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence Paper • 2512.04563 • Published Dec 4, 2025 • 14
Embodied Referring Expression Comprehension in Human-Robot Interaction Paper • 2512.06558 • Published Dec 6, 2025 • 3
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators Paper • 2512.06963 • Published Dec 7, 2025 • 3
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation Paper • 2512.08186 • Published about 1 month ago • 21
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Paper • 2512.06628 • Published Dec 7, 2025 • 12
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published about 1 month ago • 43
Reflection Removal through Efficient Adaptation of Diffusion Transformers Paper • 2512.05000 • Published Dec 4, 2025 • 15