papers - a fanqics Collection

fanqics 's Collections

papers

updated Jun 10, 2025

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22, 2025 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18, 2025 • 7
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 155
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24, 2025 • 123
3D Scene Generation: A Survey

Paper • 2505.05474 • Published May 8, 2025 • 21
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8, 2025 • 77
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection

Paper • 2504.06801 • Published Apr 9, 2025 • 4
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

Paper • 2504.07961 • Published Apr 10, 2025 • 5
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

Paper • 2504.09621 • Published Apr 13, 2025 • 11
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation

Paper • 2504.13072 • Published Apr 17, 2025 • 13
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

Paper • 2504.12364 • Published Apr 16, 2025 • 22
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Paper • 2504.05303 • Published Apr 7, 2025 • 5
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

Paper • 2504.07405 • Published Apr 10, 2025 • 11
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Paper • 2504.08727 • Published Apr 11, 2025 • 12
MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published Apr 14, 2025 • 21
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

Paper • 2504.09048 • Published Apr 12, 2025 • 7
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14, 2025 • 22
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 20
Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16, 2025 • 57
Constructing a 3D Town from a Single Image

Paper • 2505.15765 • Published May 21, 2025 • 24
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Paper • 2505.12448 • Published May 18, 2025 • 10
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

Paper • 2506.06253 • Published Jun 6, 2025 • 9
Image Reconstruction as a Tool for Feature Analysis

Paper • 2506.07803 • Published Jun 9, 2025 • 29
Vision Transformers Don't Need Trained Registers

Paper • 2506.08010 • Published Jun 9, 2025 • 22