-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12908
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 6 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 33
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 47
-
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper • 2311.13600 • Published • 47 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 28
-
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper • 2310.06389 • Published • 1 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Paper • 2305.13655 • Published • 7
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 25 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 111 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 74 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
Paper • 2303.00848 • Published -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 6 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 33
-
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper • 2310.06389 • Published • 1 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Paper • 2305.13655 • Published • 7
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 47
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 25 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 111 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 74 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper • 2311.13600 • Published • 47 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 49 -
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 28