| --- |
| |
| frameworks: |
| - Pytorch |
| license: apache-2.0 |
| tags: [] |
| tasks: |
| - text-to-image-synthesis |
| base_model: |
| - Tongyi-MAI/Z-Image |
| base_model_relation: adapter |
| --- |
| ## 模型介绍 |
|
|
| i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。 |
|
|
| 为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型: |
|
|
| * 使用负向提示词 |
| * 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"` |
| * 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"` |
| * `cfg_scale = 4` |
| * `sigma_shift = 8` |
| * 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量 |
|
|
| 在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L |
|
|
| ## 效果展示 |
|
|
| Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。 |
|
|
| ### 风格1:水彩绘画 |
|
|
| 输入图像: |
|
|
| ||||| |
| |-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ### 风格2:写实细节 |
|
|
| 输入图像: |
|
|
| |||||| |
| |-|-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ### 风格3:缤纷色块 |
|
|
| 输入图像: |
|
|
| ||||||| |
| |-|-|-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ### 风格4:鲜花少女 |
|
|
| 输入图像: |
|
|
| ||||| |
| |-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ### 风格5:黑白简约 |
|
|
| 输入图像: |
|
|
| ||||| |
| |-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ### 风格6:幻想世界 |
|
|
| 输入图像: |
|
|
| ||||||| |
| |-|-|-|-|-|-| |
|
|
| 生成图像: |
|
|
| |a cat|a dog|a girl| |
| |-|-|-| |
| |||| |
|
|
| ## 推理代码 |
|
|
| 安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): |
|
|
| ```shell |
| git clone https://github.com/modelscope/DiffSynth-Studio.git |
| cd DiffSynth-Studio |
| pip install -e . |
| ``` |
|
|
| 模型推理: |
|
|
| ```python |
| from diffsynth.pipelines.z_image import ( |
| ZImagePipeline, ModelConfig, |
| ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode |
| ) |
| from modelscope import snapshot_download |
| from safetensors.torch import save_file |
| import torch |
| from PIL import Image |
| |
| # Use `vram_config` to enable LoRA hot-loading |
| vram_config = { |
| "offload_dtype": torch.bfloat16, |
| "offload_device": "cuda", |
| "onload_dtype": torch.bfloat16, |
| "onload_device": "cuda", |
| "preparing_dtype": torch.bfloat16, |
| "preparing_device": "cuda", |
| "computation_dtype": torch.bfloat16, |
| "computation_device": "cuda", |
| } |
| |
| # Load models |
| pipe = ZImagePipeline.from_pretrained( |
| torch_dtype=torch.bfloat16, |
| device="cuda", |
| model_configs=[ |
| ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config), |
| ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"), |
| ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
| ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"), |
| ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"), |
| ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"), |
| ], |
| tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), |
| ) |
| |
| # Load images |
| snapshot_download( |
| model_id="DiffSynth-Studio/Z-Image-i2L", |
| allow_file_pattern="assets/style/*", |
| local_dir="data/Z-Image-i2L_style_input" |
| ) |
| images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)] |
| |
| # Image to LoRA |
| with torch.no_grad(): |
| embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images) |
| lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"] |
| save_file(lora, "lora.safetensors") |
| |
| # Generate images |
| prompt = "a cat" |
| negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符" |
| image = pipe( |
| prompt=prompt, |
| negative_prompt=negative_prompt, |
| seed=0, cfg_scale=4, num_inference_steps=50, |
| positive_only_lora=lora, |
| sigma_shift=8 |
| ) |
| image.save("image.jpg") |
| ``` |
|
|
|
|