diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

APACHE-2.0 License

Downloads
2.7M
Stars
23.3K
Committers
743

Bot releases are hidden (Show)

diffusers - v0.29.0: Stable Diffusion 3 Latest Release

Published by sayakpaul 4 months ago

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

As the model is gated, before using it with diffusers, you first need to go to the Stable Diffusion 3 Medium Hugging Face page, fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate.

huggingface-cli login

The code below shows how to perform text-to-image generation with SD3:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]
image

image

Refer to our documentation for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline.

Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out this directory for more details.

diffusers - v0.28.2: fix `from_single_file` clip model checkpoint key error 🐞

Published by yiyixuxu 5 months ago

  • Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319
diffusers - v0.28.1: HunyuanDiT andTransformer2D model class variants

Published by sayakpaul 5 months ago

This patch release primarily introduces the Hunyuan DiT pipeline from the Tencent team.

Hunyuan DiT

image

Hunyuan DiT is a transformer-based diffusion pipeline, introduced in the Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding paper by the Tencent Hunyuan.

import torch
from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.from_pretrained(
    "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
)
pipe.to("cuda")

# You may also use English prompt as HunyuanDiT supports both English and Chinese
# prompt = "An astronaut riding a horse"
prompt = "一个宇航员在骑马"
image = pipe(prompt).images[0]

🧠 This pipeline has support for multi-linguality.

📜 Refer to the official docs here to learn more about it.

Thanks to @gnobitab, for contributing Hunyuan DiT in #8240.

All commits

  • Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
  • [Core] Introduce class variants for Transformer2DModel by @sayakpaul in #7647
  • resolve comflicts by @toshas (direct commit on v0.28.1-patch)
  • Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
  • Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
  • [Transformer2DModel] Handle norm_type safely while remapping by @sayakpaul in #8370
  • Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @gnobitab
    • Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
    • Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)

Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!

Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!

Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

marigold

(Image taken from the official repository)

The code snippet below shows how to use this pipeline for depth estimation:

import diffusers
import torch

pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
    "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image)

vis = pipe.image_processor.visualize_depth(depth.prediction)
vis[0].save("einstein_depth.png")

depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
depth_16bit[0].save("einstein_depth_16bit.png")

Check out the API documentation here. We also have a detailed guide about the pipeline here.

Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.

🌀 Massive Refactor of from_single_file 🌀

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

Some of the changes introduced in this refactor:

  1. When loading a single file checkpoint, we will attempt to use the keys present in the checkpoint to infer a model repository on the Hugging Face Hub that we can use to configure the pipeline. For example, if you are using a single file checkpoint based on SD 1.5, we would use the configuration files in the runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.
  2. Suppose this inferred configuration isn’t appropriate for your checkpoint. In that case, you can override it using the config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.
pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>) 
  1. Deprecation of model configuration arguments for the from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.

PixArt Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!

import torch
from diffusers import PixArtSigmaPipeline

# You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]

📃 Refer to the documentation here to learn more about PixArt Sigma.

Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.

AnimateDiff SDXL

@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    clip_sample=False,
    beta_schedule="linear",
    steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
    model_id,
    motion_adapter=adapter,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    variant="fp16",
).enable_model_cpu_offload()

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

output = pipe(
    prompt="a panda surfing in the ocean, realistic, high quality",
    negative_prompt="low quality, worst quality",
    num_inference_steps=20,
    guidance_scale=8,
    width=1024,
    height=1024,
    num_frames=16,
)

frames = output.frames[0]
export_to_gif(frames, "animation.gif")

📜 Refer to the documentation to learn more.

Block-wise LoRA

@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:

...

adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
pipe.set_adapters("pixel", adapter_weight_scales)
image = pipe(
		prompt, num_inference_steps=30, generator=torch.manual_seed(0)
).images[0]

✍️ Refer to our documentation for more details and a full-fledged example.

InstantStyle

More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:

...

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.

Check out the documentation here.

ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

Custom Timesteps

We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.

from diffusers.schedulers import AysSchedules
sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
pipe = StableDiffusionXLPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")
prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up"
image = pipe(prompt=prompt, timesteps=sampling_schedule).images[0]

Check out the documentation here

device_map in Pipelines 🧪

We have introduced experimental support for device_map in our pipelines. This feature becomes relevant when you have multiple accelerators to distribute the components of a pipeline. Currently, we support only “balanced” device_map. However, we plan to support other device mapping strategies relevant to diffusion models in the future.

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", 
    torch_dtype=torch.float16, 
    device_map="balanced"
)
image = pipeline("a dog").images[0]

In cases where you might be limited to low VRAM accelerators, you can use device_map to benefit from them. Below, we simulate a situation where we have access to two GPUs, each having only a GB of VRAM (through the max_memory argument).

from diffusers import DiffusionPipeline
import torch

max_memory = {0:"1GB", 1:"1GB"}
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    device_map="balanced",
		max_memory=max_memory
)
image = pipeline("a dog").images[0]

📜 Refer to the documentation to learn more about it.

VQGAN Training Script 📈

VQGAN, proposed in Taming Transformers for High-Resolution Image Synthesis, is a crucial component in the modern generative image modeling toolbox. Once it is trained, its encoder can be leveraged to compute general-purpose tokens from input images.

Thanks to @isamu-isozaki, who contributed a script and related utilities to train VQGANs in #5483. For details, refer to the official training directory.

VideoProcessor Class

Similar to the VaeImageProcessor class, we have introduced a VideoProcessor to help make the preprocessing and postprocessing of videos easier and a little more streamlined across the pipelines. Refer to the documentation to learn more.

New Guides 📑

Starting with this release, we provide guides and tutorials to help users get started with some of the most frequently used tasks in image and video generation. For this release, we have a series of three guides about outpainting with different techniques:

Official Callbacks

We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with SDXLCFGCutoffCallback.

import torch
from diffusers import DiffusionPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback

callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution"
out = pipeline(
    prompt=prompt,
    num_inference_steps=25,
    callback_on_step_end=callback,
)

Read more on our documentation 📜

Community Pipelines and from_pipe API

Starting with this release note, we will highlight the new community pipelines! More and more of our pipelines were added as community pipelines first and graduated as official pipelines once people started to use them a lot! We do not require community pipelines to follow diffusers’ coding style, so it is the easiest way to contribute to diffusers 😊 

We also introduced a from_pipe API that’s very useful for the community pipelines that share checkpoints with our official pipelines and improve generation quality in some way:) You can use from_pipe(...) to load many community pipelines without additional memory requirements. With this API, you can easily switch between different pipelines to apply different techniques.

Read more about from_pipe API in our documentation 📃.

Here are four new community pipelines since our last release.

BoxDiff

BoxDiff lets you use bounding box coordinates for a more controlled generation. Here is an example of how you can apply this technique on a stable diffusion pipeline you had created (i.e. pipe_sd in the below example)

pipe_box = DiffusionPipeline.from_pipe(
    pipe_sd,
    custom_pipeline="pipeline_stable_diffusion_boxdiff",
)
pipe_box.enable_model_cpu_offload()
phrases = ["aurora","reindeer","meadow","lake","mountain"]
boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]
boxes = [[x / 512 for x in box] for box in boxes]

generator = torch.Generator(device="cpu").manual_seed(42)
images = pipe_box(
    prompt,
    boxdiff_phrases=phrases,
    boxdiff_boxes=boxes,
    boxdiff_kwargs={
        "attention_res": 16,
        "normalize_eot": True
    },
    num_inference_steps=50,
    generator=generator,
).images

Check out this community pipeline here

HD-Painter

HD-Painter can enhance inpainting pipelines with improved prompt faithfulness and generate higher resolution (up to 2k). You can switch from BoxDiff to HD-Painter like this

pipe = DiffusionPipeline.from_pipe(
    pipe_box,
    custom_pipeline="hd_painter"
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

prompt = "wooden boat"
init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
mask_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/masks/2.png")

image = pipe (prompt, init_image, mask_image, use_rasg = True, use_painta = True, generator=torch.manual_seed(12345)).images[0]

Check out this community pipeline here

Differential Diffusion

Differential Diffusion enables customization of the amount of change per pixel or per image region. It’s very effective in inpainting and outpainting.

pipeline = DiffusionPipeline.from_pipe(
    pipe_sdxl,
    custom_pipeline="pipeline_stable_diffusion_xl_differential_img2img",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

prompt = "a green pear"
negative_prompt = "blurry"

image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=7.5,
    num_inference_steps=25,
    original_image=image,
    image=image,
    strength=1.0,
    map=mask,
).images[0]

Check out this community pipeline here.

FRESCO

FRESCO aka FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation enables zero-shot video-to-video translation. Learn more about it from here.

All Commits

  • clean dep installation step in push_tests by @sayakpaul in #7382
  • [LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
  • [Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
  • Fix typos by @standardAI in #7411
  • fix: enable unet_3d_condition to support time_cond_proj_dim by @yhZhai in #7364
  • add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
  • Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
  • [Docs] add missing output image by @sayakpaul in #7425
  • add a "Community Scripts" section by @yiyixuxu in #7358
  • add: space for calculating memory usagee. by @sayakpaul in #7414
  • [refactor] Fix FreeInit behaviour by @a-r-r-o-w in #7410
  • Remove distutils by @sayakpaul in #7455
  • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline by @standardAI in #7262
  • [Research Projects] ORPO diffusion for alignment by @sayakpaul in #7423
  • Additional Memory clean up for slow tests by @DN6 in #7436
  • Fix for str_to_bool definition in testing utils by @DN6 in #7461
  • [Docs] Fix typos by @standardAI in #7451
  • Fixed minor error in test_lora_layers_peft.py by @UmerHA in #7394
  • Small ldm3d fix by @estelleafl in #7464
  • [tests] skip dynamo tests when python is 3.12. by @sayakpaul in #7458
  • feat: support DoRA LoRA from community by @sayakpaul in #7371
  • Fix broken link by @salcc in #7472
  • Update train_dreambooth_lora_sd15_advanced.py by @ernestchu in #7433
  • [Training utils] add kohya conversion dict. by @sayakpaul in #7435
  • Fix Tiling in ConsistencyDecoderVAE by @standardAI in #7290
  • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs by @bghira in #7446
  • Fix missing raise statements in check_inputs by @TonyLianLong in #7473
  • Add device arg to offloading with combined pipelines by @Disty0 in #7471
  • fix torch.compile for multi-controlnet of sdxl inpaint by @yiyixuxu in #7476
  • [chore] make the istructions on fetching all commits clearer. by @sayakpaul in #7474
  • Skip test_lora_fuse_nan on mps by @UmerHA in #7481
  • [Chore] Fix Colab notebook links in README.md by @thliang01 in #7495
  • [Modeling utils chore] import load_model_dict_into_meta only once by @sayakpaul in #7437
  • Improve nightly tests by @sayakpaul in #7385
  • add: a helpful message when quality and repo consistency checks fail. by @sayakpaul in #7475
  • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) by @bghira in #7447
  • cpu_offload: remove all hooks before offload by @yiyixuxu in #7448
  • Bug fix for controlnetpipeline check_image by @Fantast616 in #7103
  • fix OOM for test_vae_tiling by @yiyixuxu in #7510
  • [Tests] Speed up some fast pipeline tests by @sayakpaul in #7477
  • Memory clean up on all Slow Tests by @DN6 in #7514
  • Implements Blockwise lora by @UmerHA in #7352
  • Quick-Fix for #7352 block-lora by @UmerHA in #7523
  • add Instant id sdxl image2image pipeline by @linoytsaban in #7507
  • Perturbed-Attention Guidance by @HyoungwonCho in #7512
  • Add final_sigma_zero to UniPCMultistep by @Beinsezii in #7517
  • Fix IP Adapter Support for SAG Pipeline by @Stepheni12 in #7260
  • [Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 by @markkua in #7524
  • Fix typo in CPU offload test by @DN6 in #7542
  • Fix SVD bug (shape of time_context) by @KimbingNg in #7268
  • fix the cpu offload tests by @yiyixuxu in #7544
  • add HD-Painter pipeline by @haikmanukyan in #7520
  • add a from_pipe method to DiffusionPipeline by @yiyixuxu in #7241
  • [Community pipeline] SDXL Differential Diffusion Img2Img Pipeline by @asomoza in #7550
  • Fix FreeU tests by @DN6 in #7540
  • [Release tests] make nightly workflow dispatchable. by @sayakpaul in #7541
  • [Chore] remove class assignments for linear and conv. by @sayakpaul in #7553
  • [Tests] Speed up fast pipelines part II by @sayakpaul in #7521
  • 7529 do not disable autocast for cuda devices by @bghira in #7530
  • add: utility to format our docs too 📜 by @sayakpaul in #7314
  • UniPC Multistep fix tensor dtype/device on order=3 by @Beinsezii in #7532
  • UniPC Multistep add rescale_betas_zero_snr by @Beinsezii in #7531
  • [Core] refactor transformers 2d into multiple init variants. by @sayakpaul in #7491
  • [Chore] increase number of workers for the tests. by @sayakpaul in #7558
  • Update pipeline_animatediff_video2video.py by @AbhinavGopal in #7457
  • Skip test_freeu_enabled on MPS by @UmerHA in #7570
  • [Tests] reduce block sizes of UNet and VAE tests by @sayakpaul in #7560
  • [IF| add set_begin_index for all IF pipelines by @yiyixuxu in #7577
  • Add AudioLDM2 TTS by @tuanh123789 in #5381
  • Allow more arguments to be passed to convert_from_ckpt by @w4ffl35 in #7222
  • [Docs] fix bugs in callback docs by @Adenialzz in #7594
  • Add missing restore() EMA call in train SDXL script by @christopher-beckham in #7599
  • disable test_conversion_when_using_device_map by @yiyixuxu in #7620
  • Multi-image masking for single IP Adapter by @fabiorigano in #7499
  • add utilities for updating diffusers pipeline metadata. by @sayakpaul in #7573
  • [Core] refactor transformer_2d forward logic into meaningful conditions. by @sayakpaul in #7489
  • [Workflows] remove installation of libsndfile1-dev and libgl1 from workflows by @sayakpaul in #7543
  • [Core] add "balanced" device_map support to pipelines by @sayakpaul in #6857
  • add the option of upsample function for tiny vae by @IDKiro in #7604
  • [docs] remove duplicate tip block. by @sayakpaul in #7625
  • Modularize instruct_pix2pix SD inferencing during and after training in examples by @satani99 in #7603
  • [Tests] reduce the model sizes in the SD fast tests by @sayakpaul in #7580
  • [docs] Prompt enhancer by @stevhliu in #7565
  • [docs] T2I by @stevhliu in #7623
  • Fix cpu offload related slow tests by @yiyixuxu in #7618
  • [Core] fix img2img pipeline for Playground by @sayakpaul in #7627
  • Skip PEFT LoRA Scaling if the scale is 1.0 by @stevenjlm in #7576
  • LCM Distill Scripts Fix Bug when Initializing Target U-Net by @dg845 in #6848
  • Fixed YAML loading. by @YiqinZhao in #7579
  • fix: Replaced deprecated logger.warn with logger.warning by @Sai-Suraj-27 in #7643
  • FIX Setting device for DoRA parameters by @BenjaminBossan in #7655
  • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects by @kabachuha in #7527
  • make docker-buildx mandatory. by @sayakpaul in #7652
  • fix: metadata token by @sayakpaul in #7631
  • don't install peft from the source with uv for now. by @sayakpaul in #7679
  • Fixing implementation of ControlNet-XS by @UmerHA in #6772
  • [Core] is_cosxl_edit arg in SDXL ip2p. by @sayakpaul in #7650
  • [Docs] Add TGATE in section optimization by @WentianZhang-ML in #7639
  • fix: Updated ruff configuration to avoid deprecated configuration warning by @Sai-Suraj-27 in #7637
  • Don't install PEFT with UV in slow tests by @DN6 in #7697
  • [Workflows] remove installation of redundant modules from flax PR tests by @sayakpaul in #7662
  • [Docs] Update TGATE in section optimization. by @WentianZhang-ML in #7698
  • [docs] Pipeline loading by @stevhliu in #7684
  • Add tailscale action to push_test by @glegendre01 in #7709
  • Move IP Adapter Face ID to core by @fabiorigano in #7186
  • adding back test_conversion_when_using_device_map by @yiyixuxu in #7704
  • Cast height, width to int inside prepare latents by @DN6 in #7691
  • Cleanup ControlnetXS by @DN6 in #7701
  • fix: Fixed type annotations for compatability with python 3.8 by @Sai-Suraj-27 in #7648
  • fix/add tailscale key in case of failure by @glegendre01 in #7719
  • Animatediff Controlnet Community Pipeline IP Adapter Fix by @AbhinavGopal in #7413
  • Update Wuerschten Test by @DN6 in #7700
  • Fix Kandinksy V22 tests by @DN6 in #7699
  • [docs] AutoPipeline by @stevhliu in #7714
  • Remove redundant lines by @philipbutler in #7396
  • Support InstantStyle by @DannHuang in #7668
  • Restore AttnProcessor2_0 in unload_ip_adapter by @fabiorigano in #7727
  • fix: Fixed a wrong decorator by modifying it to @classmethod by @Sai-Suraj-27 in #7653
  • [Metadat utils] fix: json lines ordering. by @sayakpaul in #7744
  • [docs] Clean up toctree by @stevhliu in #7715
  • Fix failing VAE tiling test by @DN6 in #7747
  • Fix test for consistency decoder. by @DN6 in #7746
  • PixArt-Sigma Implementation by @lawrence-cj in #7654
  • [PixArt] fix small nits in pixart sigma by @sayakpaul in #7767
  • [Tests] mark UNetControlNetXSModelTests::test_forward_no_control to be flaky by @sayakpaul in #7771
  • Fix lora device test by @sayakpaul in #7738
  • [docs] Reproducible pipelines by @stevhliu in #7769
  • [docs] Refactor image quality docs by @stevhliu in #7758
  • Convert RGB to BGR for the SDXL watermark encoder by @btlorch in #7013
  • [docs] Fix AutoPipeline docstring by @stevhliu in #7779
  • Add PixArtSigmaPipeline to AutoPipeline mapping by @Beinsezii in #7783
  • [Docs] Update image masking and face id example by @fabiorigano in #7780
  • Add DREAM training by @AmericanPresidentJimmyCarter in #6381
  • [Scheduler] introduce sigma schedule. by @sayakpaul in #7649
  • Update InstantStyle usage in IP-Adapter documentation by @DannHuang in #7806
  • Check for latents, before calling prepare_latents - sdxlImg2Img by @nileshkokane01 in #7582
  • Add debugging workflow by @DN6 in #7778
  • [Pipeline] Fix error of SVD pipeline when num_videos_per_prompt > 1 by @wuyushuwys in #7786
  • Safetensor loading in AnimateDiff conversion scripts by @DN6 in #7764
  • Adding TextualInversionLoaderMixin for the controlnet_inpaint_sd_xl pipeline by @jschoormans in #7288
  • Added get_velocity function to EulerDiscreteScheduler. by @RuiningLi in #7733
  • Set main_input_name in StableDiffusionSafetyChecker to "clip_input" by @clinty in #7500
  • [Tests] reduce the model size in the ddim fast test by @ariG23498 in #7803
  • [Tests] reduce the model size in the ddpm fast test by @ariG23498 in #7797
  • [Tests] reduce the model size in the amused fast test by @ariG23498 in #7804
  • [Core] introduce _no_split_modules to ModelMixin by @sayakpaul in #6396
  • Add B-Lora training option to the advanced dreambooth lora script by @linoytsaban in #7741
  • SSH Runner Workflow Update by @DN6 in #7822
  • Fix CPU offload in docstring by @standardAI in #7827
  • [docs] Community pipelines by @stevhliu in #7819
  • Fix for pipeline slow test fetcher by @DN6 in #7824
  • [Tests] fix: device map tests for models by @sayakpaul in #7825
  • update the logic of is_sequential_cpu_offload by @yiyixuxu in #7788
  • [ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline by @yiyixuxu in #7820
  • [Tests] reduce the model size in the audioldm fast test by @ariG23498 in #7833
  • Fix key error for dictionary with randomized order in convert_ldm_unet_checkpoint by @yunseongcho in #7680
  • Fix hanging pipeline fetching by @DN6 in #7837
  • Update download diff format tests by @DN6 in #7831
  • Update CI cache by @DN6 in #7832
  • move to new runners by @glegendre01 in #7839
  • Change GPU Runners by @glegendre01 in #7840
  • Update deps for pipe test fetcher by @DN6 in #7838
  • [Tests] reduce the model size in the blipdiffusion fast test by @ariG23498 in #7849
  • Respect resume_download deprecation by @Wauplin in #7843
  • Remove installing python again in container by @DN6 in #7852
  • Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. by @HelloWorldBeginner in #7816
  • [docs] LCM by @stevhliu in #7829
  • Ci - change cache folder by @glegendre01 in #7867
  • [docs] Distilled inference by @stevhliu in #7834
  • Fix for "no lora weight found module" with some loras by @asomoza in #7875
  • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated by @bghira in #7880
  • Modification on the PAG community pipeline (re) by @HyoungwonCho in #7876
  • Fix image upcasting by @standardAI in #7858
  • Check shape and remove deprecated APIs in scheduling_ddpm_flax.py by @ppham27 in #7703
  • [Pipeline] AnimateDiff SDXL by @a-r-r-o-w in #6721
  • fix offload test by @yiyixuxu in #7868
  • Allow users to save SDXL LoRA weights for only one text encoder by @dulacp in #7607
  • Remove dead code and fix f-string issue by @standardAI in #7720
  • Fix several imports by @standardAI in #7712
  • [Refactor] Better align from_single_file logic with from_pretrained by @DN6 in #7496
  • [Tests] fix things after #7013 by @sayakpaul in #7899
  • Set max parallel jobs on slow test runners by @DN6 in #7878
  • fix _optional_components in StableCascadeCombinedPipeline by @yiyixuxu in #7894
  • [scheduler] support custom timesteps and sigmas by @yiyixuxu in #7817
  • upgrade to python 3.10 in the Dockerfiles by @sayakpaul in #7893
  • add missing image processors to the docs by @sayakpaul in #7910
  • [Core] introduce videoprocessor. by @sayakpaul in #7776
  • #7535 Update FloatTensor type hints to Tensor by @vanakema in #7883
  • fix bugs when using deepspeed in sdxl by @HelloWorldBeginner in #7917
  • add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline by @neuron-party in #7913
  • fix: Fixed a wrong link to supported python versions in contributing.md file by @Sai-Suraj-27 in #7638
  • [Core] fix offload behaviour when device_map is enabled. by @sayakpaul in #7919
  • Add Ascend NPU support for SDXL. by @HelloWorldBeginner in #7916
  • Official callbacks by @asomoza in #7761
  • fix AnimateDiff creation with a unet loaded with IP Adapter by @fabiorigano in #7791
  • [LoRA] Fix LoRA tests (side effects of RGB ordering) part ii by @sayakpaul in #7932
  • fix multicontrolnet save_pretrained logic for compatibility by @rebel-kblee in #7821
  • Update requirements.txt for text_to_image by @ktakita1011 in #7892
  • Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill by @dependabot[bot] in #7635
  • fix VAE loading issue in train_dreambooth by @bssrdf in #7632
  • Expansion proposal of diffusers-cli env by @standardAI in #7403
  • update to use hf-workflows for reporting the Docker build statuses by @sayakpaul in #7938
  • [Core] separate the loading utilities in modeling similar to pipelines. by @sayakpaul in #7943
  • Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline by @detkov in #7924
  • [Pipeline] Adding BoxDiff to community examples by @zjysteven in #7947
  • [tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. by @sayakpaul in #7941
  • Adding VQGAN Training script by @isamu-isozaki in #5483
  • move to GH hosted M1 runner by @glegendre01 in #7949
  • [Workflows] add a workflow that can be manually triggered on a PR. by @sayakpaul in #7942
  • refactor: Refactored code by Merging isinstance calls by @Sai-Suraj-27 in #7710
  • Fix the text tokenizer name in logger warning of PixArt pipelines by @liang-hou in #7912
  • Fix AttributeError in train_lcm_distill_lora_sdxl_wds.py by @jainalphin in #7923
  • Consistent SDXL Controlnet callback tensor inputs by @asomoza in #7958
  • remove unsafe workflow. by @sayakpaul in #7967
  • [tests] fix Pixart Sigma tests by @sayakpaul in #7966
  • Fix typo in "attention" by @jacobmarks in #7977
  • Update pipeline_controlnet_inpaint_sd_xl.py by @detkov in #7983
  • [docs] add doc for PixArtSigmaPipeline by @lawrence-cj in #7857
  • Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline by @AlexeyZhuravlev in #7961
  • fix: Fixed few docstrings according to the Google Style Guide by @Sai-Suraj-27 in #7717
  • Make VAE compatible to torch.compile() by @rootonchair in #7984
  • [docs] VideoProcessor by @stevhliu in #7965
  • Use HF_TOKEN env var in CI by @Wauplin in #7993
  • fix: Attribute error in Logger object (logger.warning) by @AMohamedAakhil in #8183
  • Remove unnecessary single file tests for SD Cascade UNet by @DN6 in #7996
  • Fix resize issue in SVD pipeline with VideoProcessor by @DN6 in #8229
  • Create custom container for doc builder by @DN6 in #8263
  • Use freedesktop_os_release() in diffusers cli for Python >=3.10 by @DN6 in #8235
  • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation by @SingleZombie in #8239
  • [Chore] run the documentation workflow in a custom container. by @sayakpaul in #8266
  • Respect resume_download deprecation V2 by @Wauplin in #8267
  • Clean up from_single_file docs by @DN6 in #8268
  • sampling bug fix in diffusers tutorial "basic_training.md" by @yue-here in #8223
  • Fix a grammatical error in the raise messages by @standardAI in #8272
  • Fix CPU Offloading Usage & Typos by @standardAI in #8230
  • Add details about 1-stage implementation in I2VGen-XL docs by @dhaivat1729 in #8282
  • [Workflows] add a more secure way to run tests from a PR. by @sayakpaul in #7969
  • Add zip package to doc builder image by @DN6 in #8284
  • [Pipeline] Marigold depth and normals estimation by @toshas in #7847
  • Release: v0.28.0 by @sayakpaul (direct commit on v0.28.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @standardAI
    • Fix typos (#7411)
    • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
    • [Docs] Fix typos (#7451)
    • Fix Tiling in ConsistencyDecoderVAE (#7290)
    • Fix CPU offload in docstring (#7827)
    • Fix image upcasting (#7858)
    • Remove dead code and fix f-string issue (#7720)
    • Fix several imports (#7712)
    • Expansion proposal of diffusers-cli env (#7403)
    • Fix a grammatical error in the raise messages (#8272)
    • Fix CPU Offloading Usage & Typos (#8230)
  • @a-r-r-o-w
    • [refactor] Fix FreeInit behaviour (#7410)
    • [Pipeline] AnimateDiff SDXL (#6721)
  • @UmerHA
    • Fixed minor error in test_lora_layers_peft.py (#7394)
    • Skip test_lora_fuse_nan on mps (#7481)
    • Implements Blockwise lora (#7352)
    • Quick-Fix for #7352 block-lora (#7523)
    • Skip test_freeu_enabled on MPS (#7570)
    • Fixing implementation of ControlNet-XS (#6772)
  • @bghira
    • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
    • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
    • 7529 do not disable autocast for cuda devices (#7530)
    • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
  • @HyoungwonCho
    • Perturbed-Attention Guidance (#7512)
    • Modification on the PAG community pipeline (re) (#7876)
  • @haikmanukyan
    • add HD-Painter pipeline (#7520)
  • @fabiorigano
    • Multi-image masking for single IP Adapter (#7499)
    • Move IP Adapter Face ID to core (#7186)
    • Restore AttnProcessor2_0 in unload_ip_adapter (#7727)
    • [Docs] Update image masking and face id example (#7780)
    • fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
  • @kabachuha
    • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
  • @lawrence-cj
    • PixArt-Sigma Implementation (#7654)
    • [docs] add doc for PixArtSigmaPipeline (#7857)
  • @vanakema
    • #7535 Update FloatTensor type hints to Tensor (#7883)
  • @zjysteven
    • [Pipeline] Adding BoxDiff to community examples (#7947)
  • @isamu-isozaki
    • Adding VQGAN Training script (#5483)
  • @SingleZombie
    • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
  • @toshas
    • [Pipeline] Marigold depth and normals estimation (#7847)

All commits

diffusers - v0.27.1: Clear `scale` argument confusion for LoRA

Published by sayakpaul 7 months ago

All commits

  • Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
  • [LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
  • Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

Stable Cascade

We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.

from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
import torch

prior = StableCascadePriorPipeline.from_pretrained(
    "stabilityai/stable-cascade-prior",
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_emb = prior(prompt=prompt).image_embeddings[0]

decoder = StableCascadeDecoderPipeline.from_pretrained(
    "stabilityai/stable-cascade",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(image_embeddings=image_emb, prompt=prompt).images[0]
image

📜 Check out the docs here to know more about the model.

Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.

Playground v2.5

PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
image

Loading from the original single-file checkpoint is also supported:

from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler
import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(url)
pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images[0]
image.save("playground_test_image.png")

You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic"  \
  --instance_data_dir="dog" \
  --output_dir="dog-playground-lora" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --use_8bit_adam \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

To know more, follow the instructions here.

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.

To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to this PR.

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.

This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3, 
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

tcd_image

📜 Check out the docs here to know more about TCD.

Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.

📜 To know about the exact usage of both of the above, refer to our official guide.

We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.

📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization.
To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
    base_model_id, 
    vae=vae, 
    torch_dtype=torch.float16
).to(device)

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

_ = pipe.invert(
    image = image,
    num_inversion_steps=50,
    skip=0.2
)

edited_image = pipe(
    editing_prompt=["tennis ball","tomato"],
    reverse_editing_direction=[True,False],
    edit_guidance_scale=[5.0,10.0],
    edit_threshold=[0.9,0.85],)

📜 Check out the docs here to learn more about LEDITS++.

Thanks to @manuelbrack for contributing this in #6074.

All commits

  • Fix flaky IP Adapter test by @DN6 in #6960
  • Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
  • Allow passing config_file argument to ControlNetModel when using from_single_file by @DN6 in #6959
  • [PEFT / docs] Add a note about torch.compile by @younesbelkada in #6864
  • [Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
  • fix: controlnet inpaint single file. by @sayakpaul in #6975
  • [docs] IP-Adapter by @stevhliu in #6897
  • fix IPAdapter unload_ip_adapter test by @yiyixuxu in #6972
  • [advanced sdxl lora script] - fix #6967 bug when using prior preservation loss by @linoytsaban in #6968
  • [IP Adapters] feat: allow low_cpu_mem_usage in ip adapter loading by @sayakpaul in #6946
  • Fix diffusers import prompt2prompt by @ihkap11 in #6927
  • add: peft to the benchmark workflow by @sayakpaul in #6989
  • Fix procecss process by @co63oc in #6591
  • Standardize model card for textual inversion sdxl by @Stepheni12 in #6963
  • Update textual_inversion.py by @Bhavay-2001 in #6952
  • [docs] Fix callout by @stevhliu in #6998
  • [docs] Video generation by @stevhliu in #6701
  • start depcrecation cycle for lora_attention_proc 👋 by @sayakpaul in #7007
  • Add documentation for strength parameter in Controlnet_img2img pipelines by @tlpss in #6951
  • Fixed typos in dosctrings of init() and in forward() of Unet3DConditionModel by @MK-2012 in #6663
  • [SVD] fix a bug when passing image as tensor by @yiyixuxu in #6999
  • Fix deprecation warning for torch.utils._pytree._register_pytree_node in PyTorch 2.2 by @zyinghua in #7008
  • [IP2P] Make text encoder truly optional in InstructPi2Pix by @sayakpaul in #6995
  • IP-Adapter attention masking by @fabiorigano in #6847
  • Fix Pixart Slow Tests by @DN6 in #6962
  • [from_single_file] pass torch_dtype to set_module_tensor_to_device by @yiyixuxu in #6994
  • [Refactor] FreeInit for AnimateDiff based pipelines by @DN6 in #6874
  • [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU by @ustcuna in #6683
  • Add section on AnimateLCM to docs by @DN6 in #7024
  • IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline by @rootonchair in #6941
  • Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline by @tontan2545 in #7031
  • Fix alt text and image links in AnimateLCM docs by @DN6 in #7029
  • Update ControlNet Inpaint single file test by @DN6 in #7022
  • Fix load_model_dict_into_meta for ControlNet from_single_file by @DN6 in #7034
  • Remove disable_full_determinism from StableVideoDiffusion xformers test. by @DN6 in #7039
  • update header by @pravdomil in #6596
  • fix doc example for fom_single_file by @yiyixuxu in #7015
  • Fix typos in text_to_image examples by @standardAI in #7050
  • Update checkpoint_merger pipeline to pass the "variant" argument by @lstein in #6670
  • allow explicit tokenizer & text_encoder in unload_textual_inversion by @H3zi in #6977
  • re-add unet refactor PR by @yiyixuxu in #7044
  • IPAdapterTesterMixin by @a-r-r-o-w in #6862
  • [Refactor] save_model_card function in text_to_image examples by @standardAI in #7051
  • Fix typos by @standardAI in #7068
  • Fix docstring of community pipeline imagic by @chongdashu in #7062
  • Change images to image. The variable images is not used anywhere by @bimsarapathiraja in #7074
  • fix: TensorRTStableDiffusionPipeline cannot set guidance_scale by @caiyueliang in #7065
  • [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline by @standardAI in #7071
  • Fix truthy-ness condition in pipelines that use denoising_start by @a-r-r-o-w in #6912
  • Fix head_to_batch_dim for IPAdapterAttnProcessor by @fabiorigano in #7077
  • [docs] Minor updates by @stevhliu in #7063
  • Modularize Dreambooth LoRA SD inferencing during and after training by @rootonchair in #6654
  • Modularize Dreambooth LoRA SDXL inferencing during and after training by @rootonchair in #6655
  • [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet by @a-r-r-o-w in #7086
  • Pass use_linear_projection parameter to mid block in UNetMotionModel by @Stepheni12 in #7035
  • Resize image before crop by @jiqing-feng in #7095
  • Small change to download in dance diffusion convert script by @DN6 in #7070
  • Fix EMA in train_text_to_image_sdxl.py by @standardAI in #7048
  • Make LoRACompatibleConv padding_mode work. by @jinghuan-Chen in #6031
  • [Easy] edit issue and PR templates by @sayakpaul in #7092
  • FIX [PEFT / Core] Copy the state dict when passing it to load_lora_weights by @younesbelkada in #7058
  • [Core] pass revision in the loading_kwargs. by @sayakpaul in #7019
  • [Examples] Multiple enhancements to the ControlNet training scripts by @sayakpaul in #7096
  • move to uv in the Dockerfiles. by @sayakpaul in #7094
  • Add tests to check configs when using single file loading by @DN6 in #7099
  • denormalize latents with the mean and std if available by @patil-suraj in #7111
  • [Dockerfile] remove uv from docker jax tpu by @sayakpaul in #7115
  • Add EDMEulerScheduler by @patil-suraj in #7109
  • add DPM scheduler with EDM formulation by @patil-suraj in #7120
  • [Docs] Fix typos by @standardAI in #7118
  • DPMSolverMultistep add rescale_betas_zero_snr by @Beinsezii in #7097
  • [Tests] make test steps dependent on certain things and general cleanup of the workflows by @sayakpaul in #7026
  • fix kwarg in the SDXL LoRA DreamBooth by @sayakpaul in #7124
  • [Diffusers CI] Switch slow test runners by @DN6 in #7123
  • [stalebot] don't close the issue if the stale label is removed by @yiyixuxu in #7106
  • refactor: move model helper function in pipeline to a mixin class by @ultranity in #6571
  • [docs] unet type hints by @a-r-r-o-w in #7134
  • use uv for installing stuff in the workflows. by @sayakpaul in #7116
  • limit documentation workflow runs for relevant changes. by @sayakpaul in #7125
  • add: support for notifying the maintainers about the docker ci status. by @sayakpaul in #7113
  • Fix setting fp16 dtype in AnimateDiff convert script. by @DN6 in #7127
  • [Docs] Fix typos by @standardAI in #7131
  • [ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder by @yiyixuxu in #7016
  • [CI] fix path filtering in the documentation workflows by @sayakpaul in #7153
  • [Urgent][Docker CI] pin uv version for now and a minor change in the Slack notification by @sayakpaul in #7155
  • Fix LCM benchmark test by @sayakpaul in #7158
  • [CI] Remove max parallel flag on slow test runners by @DN6 in #7162
  • Fix vae_encodings_fn hash in train_text_to_image_sdxl.py by @lhoestq in #7171
  • fix: loading problem for sdxl lora dreambooth by @sayakpaul in #7166
  • Map speedup by @kopyl in #6745
  • [stalebot] fix a bug by @yiyixuxu in #7156
  • Support EDM-style training in DreamBooth LoRA SDXL script by @sayakpaul in #7126
  • Fix PixArt 256px inference by @lawrence-cj in #6789
  • [ip-adapter] fix problem using embeds with the plus version of ip adapters by @asomoza in #7189
  • feat: add ip adapter benchmark by @sayakpaul in #6936
  • [Docs] more elaborate example for peft torch.compile by @sayakpaul in #7161
  • adding callback_on_step_end for StableDiffusionLDM3DPipeline by @rootonchair in #7149
  • Update requirements.txt to remove huggingface-cli by @sayakpaul in #7202
  • [advanced dreambooth lora sdxl] add DoRA training feature by @linoytsaban in #7072
  • FIx torch and cuda version in ONNX tests by @DN6 in #7164
  • [training scripts] add tags of diffusers-training by @linoytsaban in #7206
  • fix a bug in from_config by @yiyixuxu in #7192
  • Fix: UNet2DModel::init type hints; fixes issue #4806 by @fpgaminer in #7175
  • Fix typos by @standardAI in #7181
  • Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler by @thiagocrepaldi in #7151
  • [docs] Improve SVD pipeline docs by @a-r-r-o-w in #7087
  • [Docs] Update callback.md code example by @rootonchair in #7150
  • [Core] errors should be caught as soon as possible. by @sayakpaul in #7203
  • [Community] PromptDiffusion Pipeline by @iczaw in #6752
  • add TCD Scheduler by @mhh0318 in #7174
  • SDXL Turbo support and example launch by @bram-w in #6473
  • [bug] Fix float/int guidance scale not working in StableVideoDiffusionPipeline by @JinayJain in #7143
  • [Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline by @kashif in #6487
  • Update train_dreambooth_lora_sdxl_advanced.py by @landmann in #7227
  • [Core] move out the utilities from pipeline_utils.py by @sayakpaul in #7234
  • Refactor Prompt2Prompt: Inherit from DiffusionPipeline by @ihkap11 in #7211
  • add DoRA training feature to sdxl dreambooth lora script by @linoytsaban in #7235
  • fix: remove duplicated code in TemporalBasicTransformerBlock. by @AsakusaRinne in #7212
  • [Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. by @sayakpaul in #7242
  • fix: support for loading playground v2.5 single file checkpoint. by @sayakpaul in #7230
  • Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 by @DN6 in #7244
  • Remove the line. Using it create wrong output by @bimsarapathiraja in #7075
  • [docs] Merge LoRAs by @stevhliu in #7213
  • use self.device by @pravdomil in #6595
  • [docs] Community tips by @stevhliu in #7137
  • [Core] throw error when patch inputs and layernorm are provided for Transformers2D by @sayakpaul in #7200
  • [Tests] fix: VAE tiling tests when setting the right device by @sayakpaul in #7246
  • [Utils] Improve " # Copied from ..." statements in the pipelines by @sayakpaul in #6917
  • [Easy] fix: save_model_card utility of the DreamBooth SDXL LoRA script by @sayakpaul in #7258
  • Make mid block optional for flax UNet by @mar-muel in #7083
  • Solve missing clip_sample implementation in FlaxDDIMScheduler. by @hi-sushanta in #7017
  • [Tests] fix config checking tests by @sayakpaul in #7247
  • [docs] IP-Adapter image embedding by @stevhliu in #7226
  • Adds denoising_end parameter to ControlNetPipeline for SDXL by @UmerHA in #6175
  • Add npu support by @MengqingCao in #7144
  • [Community Pipeline] Skip Marigold depth_colored with color_map=None by @qqii in #7170
  • update the signature of from_single_file by @yiyixuxu in #7216
  • [UNet_Spatio_Temporal_Condition] fix default num_attention_heads in unet_spatio_temporal_condition by @Wang-Xiaodong1899 in #7205
  • [docs/nits] Fix return values based on return_dict and minor doc updates by @a-r-r-o-w in #7105
  • [Chore] remove tf mention by @sayakpaul in #7245
  • Fix gmflow_dir by @pravdomil in #6583
  • Support latents_mean and latents_std by @haofanwang in #7132
  • Inline InputPadder by @pravdomil in #6582
  • [Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications by @sayakpaul in #7129
  • instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance by @erliding in #7006
  • Change export_to_video default by @DN6 in #6990
  • [Chore] switch to logger.warning by @sayakpaul in #7289
  • [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles by @sayakpaul in #7204
  • Add single file support for Stable Cascade by @DN6 in #7274
  • Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline by @DN6 in #7287
  • Fix loading Img2Img refiner components in from_single_file by @DN6 in #7282
  • [Chore] clean residue from copy-pasting in the UNet single file loader by @sayakpaul in #7295
  • Update Cascade documentation by @DN6 in #7257
  • Update Stable Cascade Conversion Scripts by @DN6 in #7271
  • [Pipeline] Add LEDITS++ pipelines by @manuelbrack in #6074
  • [PyPI publishing] feat: automate the process of pypi publication to some extent. by @sayakpaul in #7270
  • add: support for notifying maintainers about the nightly test status by @sayakpaul in #7117
  • Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training by @Rbrq03 in #7302
  • Add Intro page of TCD by @mhh0318 in #7259
  • Fix typos in UNet2DConditionModel documentation by @alexanderbonnet in #7291
  • Change step_offset scheduler docstrings by @Beinsezii in #7128
  • update get_order_list if statement by @kghamilton89 in #7309
  • add: pytest log installation by @sayakpaul in #7313
  • [Tests] Fix incorrect constant in VAE scaling test. by @DN6 in #7301
  • log loss per image by @noskill in #7278
  • add edm schedulers in doc by @patil-suraj in #7319
  • [Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) by @linoytsaban in #7182
  • Update Cascade Tests by @DN6 in #7324
  • Release: v0.27.0 by @DN6 (direct commit on v0.27.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ihkap11
    • Fix diffusers import prompt2prompt (#6927)
    • Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
  • @ustcuna
    • [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
  • @rootonchair
    • IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
    • Modularize Dreambooth LoRA SD inferencing during and after training (#6654)
    • Modularize Dreambooth LoRA SDXL inferencing during and after training (#6655)
    • adding callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)
    • [Docs] Update callback.md code example (#7150)
  • @standardAI
    • Fix typos in text_to_image examples (#7050)
    • [Refactor] save_model_card function in text_to_image examples (#7051)
    • Fix typos (#7068)
    • [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)
    • Fix EMA in train_text_to_image_sdxl.py (#7048)
    • [Docs] Fix typos (#7118)
    • [Docs] Fix typos (#7131)
    • Fix typos (#7181)
  • @a-r-r-o-w
    • IPAdapterTesterMixin (#6862)
    • Fix truthy-ness condition in pipelines that use denoising_start (#6912)
    • [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet (#7086)
    • [docs] unet type hints (#7134)
    • [docs] Improve SVD pipeline docs (#7087)
    • [docs/nits] Fix return values based on return_dict and minor doc updates (#7105)
  • @ultranity
    • refactor: move model helper function in pipeline to a mixin class (#6571)
  • @iczaw
    • [Community] PromptDiffusion Pipeline (#6752)
  • @mhh0318
    • add TCD Scheduler (#7174)
    • Add Intro page of TCD (#7259)
  • @manuelbrack
    • [Pipeline] Add LEDITS++ pipelines (#6074)

All commits

  • Fix configuring VAE from single file mixin by @DN6 in #6950
  • [DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True by @yiyixuxu in #6953

In v0.26.0, we introduced a bug 🐛 in the BasicTransformerBlock by removing some boolean flags. This caused many popular libraries tomesd to break. We have fixed that in this release. Thanks to @vladmandic for bringing this to our attention.

All commits

  • add self.use_ada_layer_norm_* params back to BasicTransformerBlock by @yiyixuxu in #6841
diffusers - v0.26.1: Patch release to fix `torchvision` dependency

Published by sayakpaul 9 months ago

In the v0.26.0 release, we slipped in the torchvision library as a required library, which shouldn't have been the case. This is now fixed.

All commits

  • add is_torchvision_available by @yiyixuxu in #6800

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

import torch
from diffusers import (
    EulerDiscreteScheduler,
    MotionAdapter,
    PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image

adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])

pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images =  [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline(
    prompt="wonderwoman",
    ip_adapter_image=[style_images, face_image],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50
    generator=generator,
).images[0]

Reference style images:

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

  • [docs] Fix missing API function by @stevhliu in #6604
  • Fix failing tests due to Posix Path by @DN6 in #6627
  • Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
  • [Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
  • refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
  • Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
  • Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
  • fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
  • fix SDXL-kdiffusion tests by @yiyixuxu in #6647
  • add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
  • add Sa-Solver by @lawrence-cj in #5975
  • Add tearDown method to LoRA tests. by @DN6 in #6660
  • [Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
  • Update README by @standardAI in #6669
  • [Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
  • Standardise outputs for video pipelines by @DN6 in #6626
  • fix dpm related slow test failure by @yiyixuxu in #6680
  • [Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
  • [Refactor] Update from single file by @DN6 in #6428
  • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
  • Add InstantID Pipeline by @haofanwang in #6673
  • [Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
  • [Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
  • SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
  • AnimateDiff Video to Video by @a-r-r-o-w in #6328
  • [docs] UViT2D by @stevhliu in #6643
  • Correct sigmas cpu settings by @patrickvonplaten in #6708
  • [docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
  • fix community README by @a-r-r-o-w in #6645
  • fix custom diffusion training with concept list by @AIshutin in #6710
  • Add IP Adapters to slow tests by @DN6 in #6714
  • Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
  • Add Community Example Consistency Training Script by @dg845 in #6717
  • Add UFOGenScheduler to Community Examples by @dg845 in #6650
  • [Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
  • Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
  • changed to posix unet by @gzguevara in #6719
  • Change os.path to pathlib Path by @Stepheni12 in #6737
  • correct hflip arg by @sayakpaul in #6743
  • Add unload_textual_inversion method by @fabiorigano in #6656
  • [Core] move transformer scripts to transformers modules by @sayakpaul in #6747
  • Update lora.md with a more accurate description of rank by @xhedit in #6724
  • Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
  • udpate ip-adapter slow tests by @yiyixuxu in #6760
  • Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
  • [DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
  • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
  • add note about serialization by @sayakpaul in #6764
  • Update train_diffusion_dpo.py by @viettmab in #6754
  • Pin torch < 2.2.0 in test runners by @DN6 in #6780
  • [Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass by @sayakpaul in #6762
  • add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
  • Fix setting scaling factor in VAE config by @DN6 in #6779
  • Add PIA Model/Pipeline by @DN6 in #6698
  • [docs] Add missing parameter by @stevhliu in #6775
  • [IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
  • [sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
  • [Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
  • Release: v0.26.0 by @ (direct commit on v0.26.0-release)
  • fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @a-r-r-o-w
    • [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
    • AnimateDiff Video to Video (#6328)
    • [docs] AnimateDiff Video-to-Video (#6712)
    • fix community README (#6645)
  • @ultranity
    • refactor: extract init/forward function in UNet2DConditionModel (#6478)
  • @lawrence-cj
    • add Sa-Solver (#5975)
  • @ayushtues
    • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
  • @haofanwang
    • Add InstantID Pipeline (#6673)
    • [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
  • @brandostrong
    • SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
  • @dg845
    • Add Community Example Consistency Training Script (#6717)
    • Add UFOGenScheduler to Community Examples (#6650)
    • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)
diffusers - Patch release

Published by patrickvonplaten 9 months ago

Make sure diffusers can correctly be used in offline mode again: https://github.com/huggingface/diffusers/pull/1767#issuecomment-1896194917

  • Respect offline mode when loading pipeline by @Wauplin in #6456
  • Fix offline mode import by @Wauplin in #6467
diffusers - v0.25.0: aMUSEd, faster SDXL, interruptable pipelines

Published by sayakpaul 10 months ago

aMUSEd

collage_full

aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.

aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

Text-to-image generation

import torch
from diffusers import AmusedPipeline

pipe = AmusedPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "cowboy"
image = pipe(prompt, generator=torch.manual_seed(8)).images[0]
image.save("text2image_512.png")

Image-to-image generation

import torch
from diffusers import AmusedImg2ImgPipeline
from diffusers.utils import load_image

pipe = AmusedImg2ImgPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "apple watercolor"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/image2image_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)

image = pipe(prompt, input_image, strength=0.7, generator=torch.manual_seed(3)).images[0]
image.save("image2image_512.png")

Inpainting

import torch
from diffusers import AmusedInpaintPipeline
from diffusers.utils import load_image
from PIL import Image

pipe = AmusedInpaintPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a man with glasses"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)
mask = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_mask.png"
    )
    .resize((512, 512))
    .convert("L")
)    

image = pipe(prompt, input_image, mask, generator=torch.manual_seed(3)).images[0]
image.save(f"inpainting_512.png")

📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused

🛠️ Models:

Faster SDXL

We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.

SDXL_Batch_Size__1_Steps__30

These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. Starting from default fp32 precision, we can achieve a 3x speed improvement by applying different PyTorch optimization techniques. We encourage you to check out the detailed docs provided below.

Note: Compared to the default way most people use Diffusers which is fp16 + SDPA, applying all the optimization explained in the blog below yields a 30% speed-up.

📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion
🌠 PyTorch blog post: https://pytorch.org/blog/accelerating-generative-ai-3/

Interruptible pipelines

Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipe, i, t, and callback_kwargs (this must be returned). Set the pipeline's _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.enable_model_cpu_offload()
num_inference_steps = 50

def interrupt_callback(pipe, i, t, callback_kwargs):
    stop_idx = 10
    if i == stop_idx:
        pipe._interrupt = True

    return callback_kwargs

pipe(
    "A photo of a cat",
    num_inference_steps=num_inference_steps,
    callback_on_step_end=interrupt_callback,
)

📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback

peft in our LoRA training examples

We incorporated peft in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft!

More memory-friendly version of LCM LoRA SDXL training

We incorporated best practices from peft to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets library for quick experimentation. Check out this section for more details.

All commits

  • [docs] Fix video link by @stevhliu in #5986
  • Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
  • Remove a duplicated line? by @sweetcocoa in #6010
  • [examples/advanced_diffusion_training] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
  • [advanced_dreambooth_lora_sdxl_tranining_script] readme fix by @linoytsaban in #6019
  • [docs] Fix SVD video by @stevhliu in #6004
  • [Easy] minor edits to setup.py by @sayakpaul in #5996
  • [From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
  • [Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
  • [logging] Fix assertion bug by @standardAI in #6012
  • [Docs] Update a link by @standardAI in #6014
  • added attention_head_dim, attention_type, resolution_idx by @charchit7 in #6011
  • fix style by @patrickvonplaten (direct commit on v0.25.0)
  • [Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
  • [schedulers] create self.sigmas during init by @yiyixuxu in #6006
  • Post Release: v0.24.0 by @patrickvonplaten in #5985
  • LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
  • adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
  • [PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
  • [LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
  • [advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
  • fix StableDiffusionTensorRT super args error by @gujingit in #6009
  • Update value_guided_sampling.py by @Parth38 in #6027
  • Update Tests Fetcher by @DN6 in #5950
  • Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
  • [Feature] Support IP-Adapter Plus by @okotaku in #5915
  • [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
  • [advanced dreambooth lora training script][bug_fix] change token_abstraction type to str by @linoytsaban in #6040
  • [docs] Add Kandinsky 3 by @stevhliu in #5988
  • [docs] #Copied from mechanism by @stevhliu in #6007
  • Move kandinsky convert script by @DN6 in #6047
  • Pin Ruff Version by @DN6 in #6059
  • Ldm unet convert fix by @DN6 in #6038
  • Fix demofusion by @radames in #6049
  • [From single file] remove depr warning by @patrickvonplaten in #6043
  • [advanced_dreambooth_lora_sdxl_tranining_script] save embeddings locally fix by @apolinario in #6058
  • Device agnostic testing by @arsalanu in #5612
  • [feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
  • fix by @DN6 (direct commit on v0.25.0)
  • Use CC12M for LCM WDS training example by @pcuenca in #5908
  • Disable Tests Fetcher by @DN6 in #6060
  • [Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
  • [Euler Discrete] Fix sigma by @patrickvonplaten in #6078
  • Harmonize HF environment variables + deprecate use_auth_token by @Wauplin in #6066
  • [docs] SDXL Turbo by @stevhliu in #6065
  • Add ControlNet-XS support by @UmerHA in #5827
  • Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
  • [PEFT] Adapt example scripts to use PEFT by @younesbelkada in #5388
  • Fix clearing backend cache from device agnostic testing by @DN6 in #6075
  • [Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
  • EulerDiscreteScheduler add rescale_betas_zero_snr by @Beinsezii in #6024
  • Add support for IPAdapterFull by @fabiorigano in #5911
  • Fix a bug in add_noise function by @yiyixuxu in #6085
  • [Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
  • [Advanced Training Script] Fix pipe example by @apolinario in #6106
  • IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
  • IP adapter support for most pipelines by @a-r-r-o-w in #5900
  • Correct type annotation for VaeImageProcessor.numpy_to_pil by @edwardwli in #6111
  • [Docs] Fix typos by @standardAI in #6122
  • [feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
  • [Community] Add SDE Drag pipeline by @Monohydroxides in #6105
  • [docs] IP-Adapter API doc by @stevhliu in #6140
  • Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
  • [advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
  • [docs] Add missing \ in lora.md by @pierd in #6174
  • [Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
  • LoRA test fixes by @DN6 in #6163
  • Add PEFT to training deps by @DN6 in #6148
  • Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
  • Compile test fix by @DN6 in #6104
  • [LoRA] add an error message when dealing with _best_guess_weight_name ofline by @sayakpaul in #6184
  • [Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
  • [Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
  • [Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
  • Fix the test script in examples/text_to_image/README.md by @krahets in #6209
  • Nit fix to training params by @osanseviero in #6200
  • [Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
  • Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
  • Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
  • Fix t2i. blog url by @abinthomasonline in #6205
  • [Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
  • [Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
  • Deprecate Pipelines by @DN6 in #6169
  • Update README.md by @TilmannR in #6191
  • Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
  • Update train_text_to_image_lora.py by @haofanwang in #6144
  • [SVD] Fix guidance scale by @patrickvonplaten in #6002
  • Slow Test for Pipelines minor fixes by @DN6 in #6221
  • Add converter method for ip adapters by @fabiorigano in #6150
  • offload the optional module image_encoder by @yiyixuxu in #6151
  • fix: init for vae during pixart tests by @sayakpaul in #6215
  • [T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
  • ControlNetXS fixes. by @DN6 in #6228
  • add peft dependency to fast push tests by @sayakpaul in #6229
  • [refactor embeddings]pixart-alpha by @yiyixuxu in #6212
  • [Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
  • [docs] Batched seeds by @stevhliu in #6237
  • [Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
  • EulerAncestral add rescale_betas_zero_snr by @Beinsezii in #6187
  • [Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
  • Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
  • fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
  • [Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
  • move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
  • Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
  • [Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
  • [Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
  • [Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
  • [Refactor] move sag out of stable_diffusion by @sayakpaul in #6264
  • TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
  • [Refactor] move attend and excite out of stable_diffusion. by @sayakpaul in #6261
  • [Refactor] move panorama out of stable_diffusion by @sayakpaul in #6262
  • [Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
  • [Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
  • open muse by @williamberman in #5437
  • Remove ONNX inpaint legacy by @DN6 in #6269
  • Remove peft tests from old lora backend tests by @DN6 in #6273
  • Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
  • [Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
  • Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
  • [LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
  • LoRA Unfusion test fix by @DN6 in #6291
  • Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. by @celestialphineas in #6286
  • fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py by @mwkldeveloper in #6259
  • fix: t2i apdater paper link by @sayakpaul in #6314
  • fix: lora peft dummy components by @sayakpaul in #6308
  • [Tests] Speed up example tests by @sayakpaul in #6319
  • fix: cannot set guidance_scale by @Jannchie in #6326
  • Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
  • [Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
  • [Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
  • [docs] fix: animatediff docs by @sayakpaul in #6339
  • [Training] Add datasets version of LCM LoRA SDXL by @sayakpaul in #5778
  • [Peft / Lora] Add adapter_names in fuse_lora by @younesbelkada in #5823
  • [Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
  • Add rescale_betas_zero_snr Argument to DDPMScheduler by @dg845 in #6305
  • Interruptable Pipelines by @DN6 in #5867
  • Update Animatediff docs by @DN6 in #6341
  • Add AnimateDiff conversion scripts by @DN6 in #6340
  • amused other pipelines docs by @williamberman in #6343
  • [Docs] fix: video rendering on svd. by @sayakpaul in #6330
  • [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
  • adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
  • Remove unused parameters and fixed FutureWarning by @Justin900429 in #6317
  • amused update links to new repo by @williamberman in #6344
  • [LoRA] make LoRAs trained with peft loadable when peft isn't installed by @sayakpaul in #6306
  • Move ControlNetXS into Community Folder by @DN6 in #6316
  • fix: use retrieve_latents by @Jannchie in #6337
  • Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
  • Fix "push_to_hub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
  • Fix chunking in SVD by @DN6 in #6350
  • Add PEFT to advanced training script by @apolinario in #6294
  • Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @hako-mikan
    • [Community Pipeline] Regional Prompting Pipeline (#6015)
    • [Fix] Fix Regional Prompting Pipeline (#6188)
  • @TonyLianLong
    • LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
  • @okotaku
    • [Feature] Support IP-Adapter Plus (#5915)
  • @RuoyiDu
    • [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
  • @UmerHA
    • Add ControlNet-XS support (#5827)
  • @a-r-r-o-w
    • [Community] AnimateDiff + Controlnet Pipeline (#5928)
    • IP adapter support for most pipelines (#5900)
    • Add missing subclass docs, Fix broken example in SD_safe (#6116)
    • Support img2img and inpaint in lpw-xl (#6114)
  • @Monohydroxides
    • [Community] Add SDE Drag pipeline (#6105)
  • @dg845
    • Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
    • Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
    • Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
    • Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
  • @markkua
    • [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)
diffusers - v0.24.0: IP Adapters, Kandinsky 3.0, Stable Video Diffusion, SDXL Turbo

Published by patrickvonplaten 11 months ago

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

Stable Video Diffusion is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. SVD and SVD-XT. The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Since generating videos is more memory intensive, we can use the decode_chunk_size argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting decode_chunk_size=1 will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use model cpu offloading to reduce the memory usage.

rocket_generated

SDXL Turbo

SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the height and width parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set guidance_scale to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images.
Increasing the number of steps to 2, 3 or 4 should improve image quality.

from diffusers import AutoPipelineForText2Image
import torch

pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipeline_text2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0]
image

Image-to-image

For image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1.
The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in
our example below.

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

# use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

IP Adapters

IP Adapters have shown to be remarkably powerful at images conditioned on other images.

Thanks to @okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, e.g. they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id =  "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
    prompt=prompt,
    ip_adapter_image=image,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

yiyi_test_2_out

ControlNet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image=depth_map,
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_2_out.png")
ip_image condition output
statue depth yiyi_test_2_out

For more information:

Kandinsky 3.0

Kandinsky has released the 3rd version, which has much improved text-to-image alignment thanks to using Flan-T5 as the text encoder.

Text-to-Image

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

Image-to-Image

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

Check it out:

All commits

  • LCM-LoRA docs by @patil-suraj in #5782
  • [Docs] Update and make improvements by @standardAI in #5819
  • [docs] Fix title by @stevhliu in #5831
  • Improve setup.py and add dependency check by @patrickvonplaten in #5826
  • [Docs] add: japanese sdxl as a reference by @sayakpaul in #5844
  • Set usedforsecurity=False in hashlib methods (FIPS compliance) by @Wauplin in #5790
  • fix memory consistency decoder test by @williamberman in #5828
  • [PEFT] Unpin peft by @patrickvonplaten in #5850
  • Speed up the peft lora unload by @pacman100 in #5741
  • [Tests/LoRA/PEFT] Test also on PEFT / transformers / accelerate latest by @younesbelkada in #5820
  • UnboundLocalError in SDXLInpaint.prepare_latents() by @a-r-r-o-w in #5648
  • [ControlNet] fix import in single file loading by @sayakpaul in #5834
  • [Styling] stylify using ruff by @kashif in #5841
  • [Community] [WIP] LCM Interpolation Pipeline by @a-r-r-o-w in #5767
  • [JAX] Replace uses of jax.devices("cpu") with jax.local_devices(backend="cpu") by @hvaara in #5864
  • [test / peft] Fix silent behaviour on PR tests by @younesbelkada in #5852
  • fix an issue that ipex occupy too much memory, it will not impact per… by @linlifan in #5625
  • Update LCMScheduler Inference Timesteps to be More Evenly Spaced by @dg845 in #5836
  • Revert "[Docs] Update and make improvements" by @standardAI in #5858
  • [docs] Loader APIs by @stevhliu in #5813
  • Update README.md by @co63oc in #5855
  • Add tests fetcher by @DN6 in #5848
  • Addition of new callbacks to controlnets by @a-r-r-o-w in #5812
  • [docs] MusicLDM by @stevhliu in #5854
  • Add features to the Dreambooth LoRA SDXL training script by @linoytsaban in #5508
  • [feat] IP Adapters (author @okotaku ) by @yiyixuxu in #5713
  • [Lora] Seperate logic by @patrickvonplaten in #5809
  • ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline by @affromero in #5869
  • Adds an advanced version of the SD-XL DreamBooth LoRA training script supporting pivotal tuning by @linoytsaban in #5883
  • [bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5906
  • [bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5914
  • [Docs] add: 8bit inference with pixart alpha by @sayakpaul in #5814
  • [@cene555][Kandinsky 3.0] Add Kandinsky 3.0 by @patrickvonplaten in #5913
  • [Examples] Allow downloading variant model files by @patrickvonplaten in #5531
  • [Fix: pixart-alpha] random 512px resolution bug by @lawrence-cj in #5842
  • [Core] add support for gradient checkpointing in transformer_2d by @sayakpaul in #5943
  • Deprecate KarrasVeScheduler and ScoreSdeVpScheduler by @a-r-r-o-w in #5269
  • Add Custom Timesteps Support to LCMScheduler and Supported Pipelines by @dg845 in #5874
  • set the model to train state before accelerator prepare by @sywangyi in #5099
  • Avoid computing min() that is expensive when do_normalize is False in the image processor by @ivanprado in #5896
  • Fix LCM Stable Diffusion distillation bug related to parsing unet_time_cond_proj_dim by @dg845 in #5893
  • add LoRA weights load and fuse support for IPEX pipeline by @linlifan in #5920
  • Replace multiple variables with one variable. by @hi-sushanta in #5715
  • fix: error on device for lpw_stable_diffusion_xl pipeline if pipe.enable_sequential_cpu_offload() enabled by @VicGrygorchyk in #5885
  • [Vae] Make sure all vae's work with latent diffusion models by @patrickvonplaten in #5880
  • [Tests] Make sure that we don't run tests multiple times by @patrickvonplaten in #5949
  • [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems by @tongdaxu in #5939
  • [From_pretrained] Fix warning by @patrickvonplaten in #5948
  • [load_textual_inversion]: allow multiple tokens by @yiyixuxu in #5837
  • [docs] Fix space by @stevhliu in #5898
  • fix: minor typo in docstring by @soumik12345 in #5961
  • [ldm3d] Ldm3d upscaler to community pipeline by @estelleafl in #5870
  • [docs] Update pipeline list by @stevhliu in #5952
  • [Tests] Refactor test_examples.py for better readability by @sayakpaul in #5946
  • added doc for Kandinsky3.0 by @charchit7 in #5937
  • [bug fix] Inpainting for MultiAdapter by @affromero in #5922
  • Rename output_dir argument by @linhqyy in #5916
  • [LoRA refactor] move several state dict conversion utils out of lora.py by @sayakpaul in #5955
  • Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline by @juancopi81 in #5887
  • [docs] LCM training by @stevhliu in #5796
  • Controlnet ssd 1b support by @MarkoKostiv in #5779
  • [Pipeline] Add TextToVideoZeroSDXLPipeline by @vahramtadevosyan in #4695
  • [Wuerstchen] Adapt lora training example scripts to use PEFT by @kashif in #5959
  • Fixed custom module importing on Windows by @PENGUINLIONG in #5891
  • Add SVD by @patil-suraj in #5895
  • [SDXL Turbo] Add some docs by @patrickvonplaten in #5982
  • Fix SVD doc by @patil-suraj in #5983

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @a-r-r-o-w
    • UnboundLocalError in SDXLInpaint.prepare_latents() (#5648)
    • [Community] [WIP] LCM Interpolation Pipeline (#5767)
    • Addition of new callbacks to controlnets (#5812)
    • Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
  • @dg845
    • Update LCMScheduler Inference Timesteps to be More Evenly Spaced (#5836)
    • Add Custom Timesteps Support to LCMScheduler and Supported Pipelines (#5874)
    • Fix LCM Stable Diffusion distillation bug related to parsing unet_time_cond_proj_dim (#5893)
  • @affromero
    • ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline (#5869)
    • [bug fix] Inpainting for MultiAdapter (#5922)
  • @tongdaxu
    • [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems (#5939)
  • @estelleafl
    • [ldm3d] Ldm3d upscaler to community pipeline (#5870)
  • @vahramtadevosyan
    • [Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)
diffusers - [Patch release] Make sure we install correct PEFT version

Published by patrickvonplaten 11 months ago

Small patch release to make sure the correct PEFT version is installed.

All commits

  • Improve setup.py and add dependency check by @patrickvonplaten in #5826
diffusers - v0.23.0: LCM LoRA, SDXL LCM, Consistency Decoder from DALL-E 3

Published by sayakpaul 11 months ago

LCM LoRA, LCM SDXL, Consistency Decoder

LCM LoRA

Latent Consistency Models (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author @luosiallen, alongside @patil-suraj and @dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.

The approach is called LCM LoRA.

Below is an example of using LCM LoRA, taking just 4 inference steps:

from diffusers import DiffusionPipeline, LCMScheduler
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...

as well as with other LoRAs 🤯

image (31)

👉 Checkpoints
📜 Docs

If you want to learn more about the approach, please have a look at the following:

LCM SDXL

Continuing the work of Latent Consistency Models (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.

from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]

👉 Checkpoints
📜 Docs

Consistency Decoder

OpenAI open-sourced the consistency decoder used in DALL-E 3. It improves the decoding part in the Stable Diffusion v1 family of models.

import torch
from diffusers import DiffusionPipeline, ConsistencyDecoderVAE

vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=pipe.torch_dtype)
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16
).to("cuda")

pipe("horse", generator=torch.manual_seed(0)).images

Find the documentation here to learn more.

All commits

  • [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten in #5659
  • post release (v0.22.0) by @sayakpaul in #5658
  • Add Pixart to AUTO_TEXT2IMAGE_PIPELINES_MAPPING by @Beinsezii in #5664
  • Update custom diffusion attn processor by @DN6 in #5663
  • Model tests xformers fixes by @DN6 in #5679
  • Update free model hooks by @DN6 in #5680
  • Fix Basic Transformer Block by @DN6 in #5683
  • Explicit torch/flax dependency check by @DN6 in #5673
  • [PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
  • Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
  • [PixArt-Alpha] Support non-square images by @sayakpaul in #5672
  • Improve LCMScheduler by @dg845 in #5681
  • [Docs] Fix typos, improve, update at Using Diffusers' Task page by @standardAI in #5611
  • Replacing the nn.Mish activation function with a get_activation function. by @hi-sushanta in #5651
  • speed up Shap-E fast test by @yiyixuxu in #5686
  • Fix the misaligned pipeline usage in dreamshaper docstrings by @kirill-fedyanin in #5700
  • Fixed is_safetensors_compatible() handling of windows path separators by @PhilLab in #5650
  • [LCM] Fix img2img by @patrickvonplaten in #5698
  • [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
  • Fix styling issues by @patrickvonplaten in #5699
  • Add adapter fusing + PEFT to the docs by @apolinario in #5662
  • Fix prompt bug in AnimateDiff by @DN6 in #5702
  • [Bugfix] fix error of peft lora when xformers enabled by @okotaku in #5697
  • Install accelerate from PyPI in PR test runner by @DN6 in #5721
  • consistency decoder by @williamberman in #5694
  • Correct consist dec by @patrickvonplaten in #5722
  • LCM Add Tests by @patrickvonplaten in #5707
  • [LCM] add: locm docs. by @sayakpaul in #5723
  • Add LCM Scripts by @patil-suraj in #5727
diffusers - v0.22.3: Fix PixArtAlpha and LCM Image-to-Image pipelines

Published by sayakpaul 11 months ago

🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.

All commits

  • [LCM] Fix img2img by @patrickvonplaten in #5698
  • [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
diffusers - Patch Release v0.22.2: Fix Animate Diff, fix DDPM import, Pixart various

Published by patrickvonplaten 12 months ago

  • Fix Basic Transformer Block by @DN6 in #5683
  • [PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
  • Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
  • [PixArt-Alpha] Support non-square images by @sayakpaul in #5672
diffusers - Patch Release: Fix community vs. hub pipelines revision

Published by patrickvonplaten 12 months ago

  • [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten
diffusers - v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more

Published by patrickvonplaten 12 months ago

Latent Consistency Models (LCM)

Untitled

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)

# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images

Refer to the documentation to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.

PixArt-Alpha

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:

from diffusers import PixArtAlphaPipeline
import torch 

pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("sahara.png")

Check out the docs to learn more.

AnimateDiff

animatediff-doc

AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter and a UNetMotionModel. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

You can convert an existing 2D UNet into a UNetMotionModel:

from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

# Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)

# Or load motion modules after init
unet_motion.load_motion_modules(motion_adapter)

# freeze all 2D UNet layers except for the motion modules for finetuning
unet_motion.freeze_unet2d_params()

# Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")

scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

animatediff-zoom-out-lora

Check out the documentation to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

from diffusers import DiffusionPipeline
import torch

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

# Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
# Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")

# Combine the adapters.
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])

# Perform inference.
prompt = "toy_face of a hacker with a hoodie, pixel art"
image = pipe(
    prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}, generator=torch.manual_seed(0)
).images[0]
image

Untitled 1

Refer to the documentation to learn more.

Community components with community pipelines

We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.

To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "<change-username>/<change-id>", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")

prompt = "hello"

# Text embeds
prompt_embeds, negative_embeds = pipeline.encode_prompt(prompt)

# Keyframes generation (8x64x40, 2fps)
video_frames = pipeline(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    num_frames=8,
    height=40,
    width=64,
    num_inference_steps=2,
    guidance_scale=9.0,
    output_type="pt"
).frames

Refer to the documentation to learn more.

Dynamic callbacks

Most 🤗 Diffusers pipelines now accept a callback_on_step_end argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.

def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):    
    # adjust the batch_size of prompt_embeds according to guidance_scale
    if step_index == int(pipe.num_timestep * 0.4):
        prompt_embeds = callback_kwargs["prompt_embeds"]
        prompt_embeds =prompt_embeds.chunk(2)[-1]
    
    # update guidance_scale and prompt_embeds
    pipe._guidance_scale = 0.0
    callback_kwargs["prompt_embeds"] = prompt_embeds
    return callback_kwargs

Here’s how you can use it:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manual_seed(1)
out= pipe(prompt, generator=generator, callback_on_step_end=callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])

out.images[0].save("out_custom_cfg.png")

Check out the docs to learn more.

All commits

  • [PEFT / LoRA ] Fix text encoder scaling by @younesbelkada in #5204
  • Fix doc KO unconditional_image_generation.md by @mishig25 in #5236
  • Flax: Ignore PyTorch, ONNX files when they coexist with Flax weights by @pcuenca in #5237
  • Fixed constants.py not using hugging face hub environment variable by @Zanz2 in #5222
  • Compile test fixes by @DN6 in #5235
  • [PEFT warnings] Only sure deprecation warnings in the future by @patrickvonplaten in #5240
  • Add docstrings in forward methods of adapter model by @Nandika-A in #5253
  • make style by @patrickvonplaten (direct commit on main)
  • [WIP] Refactor UniDiffuser Pipeline and Tests by @dg845 in #4948
  • fix: how print training resume logs. by @sayakpaul in #5117
  • Add docstring for the AutoencoderKL's decode by @freespirit in #5242
  • Add a docstring for the AutoencoderKL's encode by @freespirit in #5239
  • Update UniPC to support 1D diffusion. by @leng-yue in #5199
  • [Schedulers] Fix callback steps by @patrickvonplaten in #5261
  • make fix copies by @patrickvonplaten (direct commit on main)
  • [Research folder] Add SDXL example by @patrickvonplaten in #5275
  • Fix UniPC scheduler for 1D by @patrickvonplaten in #5276
  • New Pipeline Slow Test runners by @DN6 in #5131
  • handle case when controlnet is list or tuple by @noskill in #5179
  • make style by @patrickvonplaten (direct commit on main)
  • Zh doc by @WADreaming in #4807
  • ✨ [Core] Add FreeU mechanism by @kadirnar in #5164
  • pin torch version by @DN6 in #5297
  • add: entry for DDPO support. by @sayakpaul in #5250
  • Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction … by @bghira in #5238
  • Update bug-report.yml by @patrickvonplaten (direct commit on main)
  • Bump tolerance on shape test by @DN6 in #5289
  • Add from single file to StableDiffusionUpscalePipeline and StableDiffusionLatentUpscalePipeline by @DN6 in #5194
  • [LoRA] fix: torch.compile() for lora conv by @sayakpaul in #5298
  • [docs] Improved inpaint docs by @stevhliu in #5210
  • Minor fixes by @TimothyAlexisVass in #5309
  • [Hacktoberfest]Fixing issues #5241 by @jgyfutub in #5255
  • Update README.md by @ShubhamJagtap2000 in #5267
  • fix typo in train dreambooth lora description by @themez in #5332
  • Fix [core/GLIGEN]: TypeError when iterating over 0-d tensor with In-painting mode when EulerAncestralDiscreteScheduler is used by @rchuzh99 in #5305
  • fix inference in custom diffusion by @caopulan in #5329
  • Improve performance of fast test by reducing down blocks by @sepal in #5290
  • make-fast-test-for-StableDiffusionControlNetPipeline-faster by @m0saan in #5292
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5299
  • Add py.typed for PEP 561 compliance by @byarbrough in #5326
  • [HacktoberFest] Add missing docstrings to diffusers/models by @a-r-r-o-w in #5248
  • make style by @patrickvonplaten (direct commit on main)
  • Fix links in docs to adapter code by @johnowhitaker in #5323
  • replace references to deprecated KeyArray & PRNGKeyArray by @jakevdp in #5324
  • Fix loading broken LoRAs that could give NaN by @patrickvonplaten in #5316
  • [JAX] Replace uses of jnp.array in types with jnp.ndarray. by @hvaara in #4719
  • Add missing dependency in requirements file by @juliensimon in #5345
  • fix problem of 'accelerator.is_main_process' to run in mutiple GPUs by @jiaqiw09 in #5340
  • [docs] Create a mask for inpainting by @stevhliu in #5322
  • Adding PyTorch XLA support for sdxl inference by @ssusie in #5273
  • [Examples] use loralinear instead of depecrecated lora attn procs. by @sayakpaul in #5331
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5312
  • Fix StableDiffusionXLImg2ImgPipeline creation in sdxl tutorial by @soumik12345 in #5367
  • I Added Doc-String Into The class. by @hi-sushanta in #5293
  • make style by @patrickvonplaten (direct commit on main)
  • [docs] Minor fixes by @stevhliu in #5369
  • New xformers test runner by @DN6 in #5349
  • [Core] Add FreeU to all the core pipelines and their (mostly-used) derivatives by @sayakpaul in #5376
  • [core / PEFT / LoRA] Integrate PEFT into Unet by @younesbelkada in #5151
  • [Bot] FIX stale.py uses timezone-aware datetime by @sayakpaul in #5396
  • [Examples] fix unconditioning generation training example for mixed-precision training by @sayakpaul in #5407
  • [Wuerstchen] text to image training script by @kashif in #5052
  • [Docs] add docs on peft diffusers integration by @sayakpaul in #5359
  • chore: fix typos by @afuetterer in #5386
  • [Examples] Update with HFApi by @sayakpaul in #5393
  • Add ability to mix usage of T2I-Adapter(s) and ControlNet(s). by @GreggHelt2 in #5362
  • make style by @patrickvonplaten (direct commit on main)
  • [Core] Fix/pipeline without text encoders for SDXL by @sayakpaul in #5301
  • [Examples] Follow up of #5393 by @sayakpaul in #5420
  • changed channel parameters for UNET and VAE. Changed configs parameters of CLIPText by @aeros29 in #5370
  • Chore: Typo fixed in multiple files by @SusheelThapa in #5422
  • Update base image for slow CUDA tests by @DN6 in #5426
  • Fix pipe fetcher for slow tests by @DN6 in #5424
  • make fix copies by @patrickvonplaten (direct commit on main)
  • Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
  • [from_single_file()]fix: local single file loading. by @sayakpaul in #5440
  • Add latent consistency by @patrickvonplaten in #5438
  • Update-DeepFloyd-IF-Pipelines-Docstrings by @m0saan in #5304
  • style(sdxl): remove identity assignments by @liang-hou in #5418
  • Fix the order of width and height of original size in SDXL training script by @linjiapro in #5382
  • make style by @patrickvonplaten (direct commit on main)
  • Beautiful Doc string added into the UNetMidBlock2D class. by @hi-sushanta in #5389
  • make style by @patrickvonplaten (direct commit on main)
  • fix une2td ignoring class_labels by @kesimeg in #5401
  • Added support to create asymmetrical U-Net structures by @Gothos in #5400
  • [PEFT] Fix scale unscale with LoRA adapters by @younesbelkada in #5417
  • Make T2I-Adapter downscale padding match the UNet by @RyanJDick in #5435
  • Update README.md by @anvilarth in #5497
  • fixed SDXL text encoder training bug #5016 by @shyammarjit in #5078
  • make style by @patrickvonplaten (direct commit on main)
  • [torch.compile] fix graph break problems partially by @sayakpaul in #5453
  • Fix Slow Tests by @DN6 in #5469
  • Fix typo in controlnet docs by @MrSyee in #5486
  • [BUG] in transformer_temporal Fix Bugs by @zideliu in #5496
  • [docs] Fix links by @stevhliu in #5499
  • fix a few issues in controlnet inpaint pipelines by @yiyixuxu in #5470
  • Fixed autoencoder typo by @abhisharsinha in #5500
  • [Core] Refactor activation and normalization layers by @sayakpaul in #5493
  • Register BaseOutput subclasses as supported torch.utils._pytree nodes by @BowenBao in #5459
  • Japanese docs by @isamu-isozaki in #5478
  • [docs] General updates by @stevhliu in #5378
  • Add Latent Consistency Models Pipeline by @dg845 in #5448
  • fix typo by @mymusise in #5505
  • fix error of peft lora when xformers enabled by @AnyISalIn in #5506
  • fix a bug in 2nd order schedulers when using in ensemble of experts config by @yiyixuxu in #5511
  • [Schedulers] Fix 2nd order other than heun by @patrickvonplaten in #5526
  • Add a new community pipeline by @nagolinc in #5477
  • make style by @patrickvonplaten (direct commit on main)
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5391
  • make fix-copies by @patrickvonplaten (direct commit on main)
  • Fix missing punctuation in PHILOSOPHY.md by @RampagingSloth in #5530
  • fix a bug on torch_dtype argument in from_single_file of ControlNetModel by @xuyxu in #5528
  • [docs] Loader docs by @stevhliu in #5473
  • Add from_pt flag to enable model from PT by @RissyRan in #5501
  • Remove multiple if-else statement in the get_activation function. by @hi-sushanta in #5446
  • [Tests] Speed up expert of mixture tests by @patrickvonplaten in #5533
  • [Tests] Optimize test configurations for faster execution by @p1kit in #5535
  • [Remote code] Add functionality to run remote models, schedulers, pipelines by @patrickvonplaten in #5472
  • Update train_dreambooth.py - fix typos by @nickkolok in #5539
  • correct checkpoint in kandinsky2.2 doc page by @yiyixuxu in #5550
  • [Core] fix FreeU disable method by @sayakpaul in #5552
  • [docs] Internal classes API by @stevhliu in #5513
  • fix error reported 'find_unused_parameters' running in mutiple GPUs by @jiaqiw09 in #5355
  • docs: initial pt translation by @SirMonteiro in #5549
  • Fix moved _expand_mask function by @patrickvonplaten in #5581
  • [PEFT / Tests] Add peft slow tests on push by @younesbelkada in #5419
  • Add realfill by @thuanz123 in #5456
  • add fix to be able use StableDiffusionXLAdapterPipeline.from_single_file by @pshtif in #5547
  • Stabilize DPM++, especially for SDXL and SDE-DPM++ by @LuChengTHU in #5541
  • Fix incorrect loading of custom pipeline by @a-r-r-o-w in #5568
  • [core / PEFT ]Bump transformers min version for PEFT integration by @younesbelkada in #5579
  • Fix divide by zero RuntimeWarning by @TimothyAlexisVass in #5543
  • [Community Pipelines] add textual inversion support for stable_diffusion_ipex by @miaojinc in #5571
  • fix a mistake in text2image training script for kandinsky2.2 by @yiyixuxu in #5244
  • Update docker image for xformers by @DN6 in #5597
  • [Docs] Fix typos by @standardAI in #5583
  • [Docs] Fix typos, improve, update at Tutorials page by @standardAI in #5586
  • [docs] Lu lambdas by @stevhliu in #5602
  • Update final CPU offloading code for more diffusion pipelines by @clarencechen in #5589
  • [Core] enable lora for sdxl adapters too and add slow tests. by @ilisparrow in #5555
  • fix by @patrickvonplaten (direct commit on main)
  • Remove Redundant Variables from Encoder and Decoder by @hi-sushanta in #5569
  • Revert "Fix the order of width and height of original size in SDXL training script" by @patrickvonplaten in #5614
  • [PEFT / LoRA] Fix civitai bug when network alpha is an empty dict by @younesbelkada in #5608
  • [Docs] Fix typos, improve, update at Get Started page by @standardAI in #5587
  • [SDXL Adapter] Revert load lora by @patrickvonplaten in #5615
  • [docs] Kandinsky guide by @stevhliu in #4555
  • [remote code] document trust remote code. by @sayakpaul in #5620
  • [Tests] Fix cpu offload test by @patrickvonplaten in #5626
  • [Docs] Fix typos, improve, update at Conceptual Guides page by @standardAI in #5585
  • Animatediff Proposal by @DN6 in #5413
  • [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page by @standardAI in #5584
  • [LCM] Make sure img2img works by @patrickvonplaten in #5632
  • Update animatediff docs to include section on Motion LoRAs by @DN6 in #5639
  • [Easy] Minor AnimateDiff Doc nits by @sayakpaul in #5640
  • fix a bug in AutoPipeline.from_pipe() when creating a controlnet pipeline from an existing controlnet by @yiyixuxu in #5638
  • [Easy] clean up the LCM docstrings. by @sayakpaul in #5637
  • Model loading speed optimization by @RyanJDick in #5635
  • Clean up LCM Pipeline and Test Code. by @dg845 in #5641
  • [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page by @standardAI in #5627
  • [Core] support for tiny autoencoder in img2img by @sayakpaul in #5636
  • Remove the redundant line from the adapter.py file. by @hi-sushanta in #5618
  • add callbacks to denoising step by @yiyixuxu in #5427
  • [Feat] PixArt-Alpha by @sayakpaul in #5642
  • correct pipeline class name by @sayakpaul in #5652

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @dg845
    • [WIP] Refactor UniDiffuser Pipeline and Tests (#4948)
    • Add Latent Consistency Models Pipeline (#5448)
    • Clean up LCM Pipeline and Test Code. (#5641)
  • @kadirnar
    • ✨ [Core] Add FreeU mechanism (#5164)
  • @a-r-r-o-w
    • Improve typehints and docs in diffusers/models (#5299)
    • [HacktoberFest] Add missing docstrings to diffusers/models (#5248)
    • Improve typehints and docs in diffusers/models (#5312)
    • Improve typehints and docs in diffusers/models (#5391)
    • Fix incorrect loading of custom pipeline (#5568)
  • @isamu-isozaki
    • Japanese docs (#5478)
  • @nagolinc
    • Add a new community pipeline (#5477)
  • @SirMonteiro
    • docs: initial pt translation (#5549)
  • @thuanz123
    • Add realfill (#5456)
  • @standardAI
    • [Docs] Fix typos (#5583)
    • [Docs] Fix typos, improve, update at Tutorials page (#5586)
    • [Docs] Fix typos, improve, update at Get Started page (#5587)
    • [Docs] Fix typos, improve, update at Conceptual Guides page (#5585)
    • [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page (#5584)
    • [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page (#5627)