-
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Paper • 2605.03849 • Published • 125 -
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Paper • 2604.28196 • Published • 72 -
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
Paper • 2605.04637 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2604.28196
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 144 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 147
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95
-
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Paper • 2604.28196 • Published • 72 -
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper • 2605.02881 • Published • 345 -
Recursive Multi-Agent Systems
Paper • 2604.25917 • Published • 273 -
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 11
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 180 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Paper • 2605.03849 • Published • 125 -
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Paper • 2604.28196 • Published • 72 -
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
Paper • 2605.04637 • Published • 3
-
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Paper • 2604.28196 • Published • 72 -
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper • 2605.02881 • Published • 345 -
Recursive Multi-Agent Systems
Paper • 2604.25917 • Published • 273 -
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 11
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 144 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 147
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 180 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34 -
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Paper • 2508.01242 • Published • 11 -
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
Paper • 2603.18002 • Published • 13 -
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Paper • 2603.19235 • Published • 95