CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 1 day ago • 27
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 3 days ago • 109
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 5 days ago • 42
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 6 days ago • 149
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 20 days ago • 21
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 8 days ago • 35
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 9 days ago • 119
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published 15 days ago • 106
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published 15 days ago • 87
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 15 days ago • 134
ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published 19 days ago • 13
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent Paper • 2603.13875 • Published 18 days ago • 34
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published 16 days ago • 8
From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space Paper • 2603.12648 • Published 19 days ago • 14
DVD: Deterministic Video Depth Estimation with Generative Priors Paper • 2603.12250 • Published 20 days ago • 26