Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.28196

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Paper • 2605.03849 • Published 25 days ago • 125
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Paper • 2604.28196 • Published 30 days ago • 72
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

Paper • 2605.04637 • Published 24 days ago • 3

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published Mar 27 • 144
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published Mar 26 • 156
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 147

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 13
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Paper • 2604.28196 • Published 30 days ago • 72
MolmoAct2: Action Reasoning Models for Real-world Deployment

Paper • 2605.02881 • Published 26 days ago • 345
Recursive Multi-Agent Systems

Paper • 2604.25917 • Published Apr 28 • 273
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

Paper • 2605.08384 • Published 22 days ago • 11

Papers I'm going to read

about 21 hours ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 180
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 53
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Paper • 2605.03849 • Published 25 days ago • 125
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Paper • 2604.28196 • Published 30 days ago • 72
SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

Paper • 2605.04637 • Published 24 days ago • 3

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Paper • 2604.28196 • Published 30 days ago • 72
MolmoAct2: Action Reasoning Models for Real-world Deployment

Paper • 2605.02881 • Published 26 days ago • 345
Recursive Multi-Agent Systems

Paper • 2604.25917 • Published Apr 28 • 273
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

Paper • 2605.08384 • Published 22 days ago • 11

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published Mar 27 • 144
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published Mar 26 • 156
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 147

Papers I'm going to read

about 21 hours ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 180
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 53
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Paper • 2507.07982 • Published Jul 10, 2025 • 34
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

Paper • 2603.18002 • Published Mar 18 • 13
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Paper • 2603.19235 • Published Mar 19 • 95

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs