SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 5 days ago • 74
Aligning Latent Geometry for Spherical Flow Matching in Image Generation Paper • 2605.15193 • Published 5 days ago • 6
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors Paper • 2605.00658 • Published 18 days ago • 82
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation Paper • 2604.20841 • Published 27 days ago • 24
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published Mar 23 • 48
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time Paper • 2512.08924 • Published Dec 9, 2025 • 21
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published Mar 3 • 145
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Paper • 2312.02145 • Published Dec 4, 2023 • 8
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching Paper • 2602.12280 • Published Feb 12 • 34
view article Article We’re open-sourcing our text-to-image model and the process behind it Photoroom • Nov 12, 2025 • 99
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25, 2025 • 6
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article FineVideo: behind the scenes +4 mfarre, andito, lewtun, lvwerra, pcuenq, thomwolf • Sep 23, 2024 • 35
view article Article CinePile 2.0 - making stronger datasets with adversarial refinement +2 RuchitRawal, mfarre, somepago, lvwerra • Oct 23, 2024 • 19