papers
updated
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper
• 2504.16064
• Published
• 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision
Foundation Models
Paper
• 2504.14032
• Published
• 7
Towards Understanding Camera Motions in Any Video
Paper
• 2504.15376
• Published
• 155
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
3D Scene Generation: A Survey
Paper
• 2505.05474
• Published
• 21
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular
Detection
Paper
• 2504.06801
• Published
• 4
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Paper
• 2504.07961
• Published
• 5
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal
in Large Images
Paper
• 2504.09621
• Published
• 11
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
Paper
• 2504.13072
• Published
• 13
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
• 2504.12364
• Published
• 22
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
• 2504.05303
• Published
• 5
FlexIP: Dynamic Control of Preservation and Personality for Customized
Image Generation
Paper
• 2504.07405
• Published
• 11
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections
of Images
Paper
• 2504.08727
• Published
• 12
MIEB: Massive Image Embedding Benchmark
Paper
• 2504.10471
• Published
• 21
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via
Adaptive Block-Based Gaussian Splatting
Paper
• 2504.09048
• Published
• 7
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
• 2504.10483
• Published
• 22
PerceptionLM: Open-Access Data and Models for Detailed Visual
Understanding
Paper
• 2504.13180
• Published
• 20
Visual Planning: Let's Think Only with Images
Paper
• 2505.11409
• Published
• 57
Constructing a 3D Town from a Single Image
Paper
• 2505.15765
• Published
• 24
SSR: Enhancing Depth Perception in Vision-Language Models via
Rationale-Guided Spatial Reasoning
Paper
• 2505.12448
• Published
• 10
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence
with Egocentric-Exocentric Vision
Paper
• 2506.06253
• Published
• 9
Image Reconstruction as a Tool for Feature Analysis
Paper
• 2506.07803
• Published
• 29
Vision Transformers Don't Need Trained Registers
Paper
• 2506.08010
• Published
• 22