Instructions to use longlian/lmd_plus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use longlian/lmd_plus with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("longlian/lmd_plus", dtype=torch.bfloat16, device_map="cuda") prompt = "In an indoor scene, a blue cube directly above a red cube with a vase on the left of them" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
LMD+ Model Card
Paper | Project Page | 5-minute Blog Post | Demo | Code | Citation | Related work: LLM-grounded Video Diffusion Models
LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at here.
This LMD+ model is based on Stable Diffusion v1.4 and integrates the adapters trained with GLIGEN. The model can be directly used with our LLMGroundedDiffusionPipeline, which is a simplified pipeline of LMD+ without per-box generation.
See the original SD Model Card here.
Cite our work
@article{lian2023llmgrounded,
title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}
- Downloads last month
- 148