SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality Paper • 2306.14610 • Published Jun 26, 2023 • 2
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks Paper • 2403.11085 • Published Mar 17, 2024
TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action Paper • 2412.05479 • Published Dec 7, 2024
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models Paper • 2412.07012 • Published Dec 9, 2024 • 1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published Jun 5, 2025 • 21
Explain Before You Answer: A Survey on Compositional Visual Reasoning Paper • 2508.17298 • Published Aug 24, 2025 • 4
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published Dec 15, 2025 • 17
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 32
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published 6 days ago • 16
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 45
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation Paper • 2602.11337 • Published Feb 11 • 8
MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation Paper • 2603.16861 • Published 14 days ago • 9
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 32
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published 6 days ago • 16
Running on Zero Featured 14 VFig Image2SVG Demo 🎨 14 VFig converts any diagram image into editable SVG code.
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published 6 days ago • 16