LTX-2: Efficient Joint Audio-Visual Foundation Model Paper ⢠2601.03233 ⢠Published Jan 6 ⢠146
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper ⢠2512.20618 ⢠Published Dec 23, 2025 ⢠54
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper ⢠2512.16924 ⢠Published Dec 18, 2025 ⢠27
DocReward: A Document Reward Model for Structuring and Stylizing Paper ⢠2510.11391 ⢠Published Oct 13, 2025 ⢠27
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper ⢠2509.15221 ⢠Published Sep 18, 2025 ⢠111
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper ⢠2508.14029 ⢠Published Aug 19, 2025 ⢠118
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper ⢠2507.06165 ⢠Published Jul 8, 2025 ⢠60
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper ⢠2507.07982 ⢠Published Jul 10, 2025 ⢠34
Calligrapher: Freestyle Text Image Customization Paper ⢠2506.24123 ⢠Published Jun 30, 2025 ⢠37
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper ⢠2505.20275 ⢠Published May 26, 2025 ⢠18
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper ⢠2504.08388 ⢠Published Apr 11, 2025 ⢠42
An Empirical Study of GPT-4o Image Generation Capabilities Paper ⢠2504.05979 ⢠Published Apr 8, 2025 ⢠64
Video-R1: Reinforcing Video Reasoning in MLLMs Paper ⢠2503.21776 ⢠Published Mar 27, 2025 ⢠79
Large Motion Video Autoencoding with Cross-modal Video VAE Paper ⢠2412.17805 ⢠Published Dec 23, 2024 ⢠24
RoLoRA Collection [EMNLP2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization ⢠3 items ⢠Updated Sep 26, 2024 ⢠3