VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Paper • 2401.13649 • Published Jan 24, 2024 • 1
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published 23 days ago • 33
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 15 days ago • 42
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 50
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published 13 days ago • 85