Releases: huggingface/diffusers
v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model
🙆♀️ New Models
VersatileDiffusion
VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.
- [Versatile Diffusion] Add versatile diffusion model by @patrickvonplaten @anton-l #1283
Make sure to installtransformers
from "main":
pip install git+https://github.com/huggingface/transformers
Then you can run:
from diffusers import VersatileDiffusionPipeline
import torch
import requests
from io import BytesIO
from PIL import Image
pipe = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# initial image
url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
# prompt
prompt = "a red car"
# text to image
image = pipe.text_to_image(prompt).images[0]
# image variation
image = pipe.image_variation(image).images[0]
# image variation
image = pipe.dual_guided(prompt, image).images[0]
More in-depth details can be found on:
AltDiffusion
AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.
- Add AltDiffusion by @patrickvonplaten @patil-suraj #1299
Stable Diffusion Image Variations
StableDiffusionImageVariationPipeline
by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.
- StableDiffusionImageVariationPipeline by @patil-suraj #1365
Safe Latent Diffusion
Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers
.
- Add Safe Stable Diffusion Pipeline by @manuelbrack #1244
VQ-Diffusion with classifier-free sampling
vq diffusion classifier free sampling by @williamberman #1294
LDM super resolution
LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.
- Add LDM Super Resolution pipeline by @duongna21 #1116
CycleDiffusion
CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of
- Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
Traditional unpaired image-to-image translation with diffusion models trained on two related domains. - Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
CLIPSeg + StableDiffusionInpainting.
Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.
K-Diffusion wrapper
K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers
models.
- [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten #1360
🌀New SOTA Scheduler
DPMSolverMultistepScheduler
is the 🧨 diffusers
implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.
You can use it like this:
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "runwayml/stable-diffusion-v1-5"
scheduler = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
🌐 Better scheduler API
The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.
You can load a scheduler using from_pretrained
, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers
, as mentioned above!
🎉 Performance
We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:
- Enable memory-efficient attention by default if xFormers is installed.
- Use batched-matmuls when possible.
🎁 Quality of Life improvements
- Fix/Enable all schedulers for in-painting
- Easier loading of local pipelines
- cpu offloading: mutli GPU support
📝 Changelog
- Add multistep DPM-Solver discrete scheduler by @LuChengTHU in #1132
- Remove warning about half precision on MPS by @pcuenca in #1163
- Fix typo latens -> latents by @duongna21 in #1171
- Fix community pipeline links by @pcuenca in #1162
- [Docs] Add loading script by @patrickvonplaten in #1174
- Fix dtype safety checker inpaint legacy by @patrickvonplaten in #1137
- Community pipeline img2img inpainting by @vvvm23 in #1114
- [Community Pipeline] Add multilingual stable diffusion to community pipelines by @juancopi81 in #1142
- [Flax examples] Load text encoder from subfolder by @duongna21 in #1147
- Link to Dreambooth blog post instead of W&B report by @pcuenca in #1180
- Fix small typo by @pcuenca in #1178
- [DDIMScheduler] fix noise device in ddim step by @patil-suraj in #1189
- MPS schedulers: don't use float64 by @pcuenca in #1169
- Warning for invalid options without "--with_prior_preservation" by @shirayu in #1065
- [ONNX] Improve ONNXPipeline scheduler compatibility, fix safety_checker by @anton-l in #1173
- Restore compatibility with deprecated
StableDiffusionOnnxPipeline
by @pcuenca in #1191 - Update pr docs actions by @mishig25 in #1194
- handle dtype xformers attention by @patil-suraj in #1196
- [Scheduler] Move predict epsilon to init by @patrickvonplaten in #1155
- add licenses to pipelines by @natolambert in #1201
- Fix cpu offloading by @anton-l in #1177
- Fix slow tests by @patrickvonplaten in #1210
- [Flax] fix extra copy pasta 🍝 by @camenduru in #1187
- [CLIPGuidedStableDiffusion] support DDIM scheduler by @patil-suraj in #1190
- Fix layer names convert LDM script by @duongna21 in #1206
- [Loading] Make sure loading edge cases work by @patrickvonplaten in #1192
- Add LDM Super Resolution pipeline by @duongna21 in #1116
- [Conversion] Improve conversion script by @patrickvonplaten in #1218
- DDIM docs by @patrickvonplaten in #1219
- apply
repeat_interleave
fix formps
to stable diffusion image2image pipeline by @jncasey in #1135 - Flax tests: don't hardcode number of devices by @pcuenca in #1175
- Improve documentation for the LPW pipeline by @exo-pla-net in #1182
- Factor out encode text with Copied from by @patrickvonplaten in #1224
- Match the generator device to the pipeline for DDPM and DDIM by @anton-l in #1222
- [Tests] Fix mps+generator fast tests by @anton-l in #1230
- [Tests] Adjust TPU test values by @anton-l in #1233
- Add a reference to the name 'Sampler' by @apolinario in #1172
- Fix Flax usage comments by @pcuenca in #1211
- [Docs] improve img2img example by @ruanrz in #1193
- [Stable Diffusion] Fix padding / truncation by @patrickvonplaten in #1226
- Finalize stable diffusion refactor by @patrickvonplaten in #1269
- Edited attention.py for older xformers by @Lime-Cakes in #1270
- Fix wrong link in text2img fine-tuning documentation by @daspartho in #1282
- [StableDiffusionInpaintPipeline] fix batch_size for mask and masked latents by @patil-suraj in #1279
- Add UNet 1d for RL model for planning + colab by @natolambert in #105
- Fix documentation typo for
UNet2DModel
andUNet2DConditionModel
by @xenova in #1275 - add source link to composable diffusion model by @nanliu1 in #1293
- Fix incorrect link to Stable Diffusion notebook by @dhruvrnaik in #1291
- [dreambooth] link to bitsandbytes readme for installation by @0xdevalias in #1229
- Add Scheduler.from_pretrained and better scheduler changing by @patrickvonplaten in #1286
- Add AltDiffusion by @patrickvonplaten in #1299
- Better error messag...
v0.7.2: Patch release
v0.7.1: Patch release
This patch release makes accelerate
a soft dependency to avoid an error when installing diffusers
with pre-existing torch
.
- Move accelerate to a soft-dependency #1134 by @patrickvonplaten
v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community
❤️ PyTorch + Accelerate
accelerate
for improved model loading times!
Install Diffusers with pip install --upgrade diffusers[torch]
to get everything in a single command.
🍎 Apple Silicon support with PyTorch 1.13
PyTorch and Apple have been working on improving mps
support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!
Requirements
- Mac computer with Apple silicon (M1/M2) hardware.
- macOS 12.6 or later (13.0 or later recommended, as support is even better).
- arm64 version of Python.
- PyTorch 1.13.0 official release, installed from pip or the conda channels.
Memory efficient generation
Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")
# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()
prompt = "a photo of an astronaut riding a horse on mars"
# First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)
image = pipe(prompt).images[0]
image.save("astronaut.png")
Continuous Integration
Our automated tests now include a full battery of tests on the mps
device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.
See more details in the documentation.
💃 Dance Diffusion
diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!
- [Dance Diffusion] Add dance diffusion by @patrickvonplaten #803
Try it out to generate some random music:
from diffusers import DiffusionPipeline
import scipy
model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")
audio = pipeline(audio_length_in_s=4.0).audios[0]
# To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())
🎉 Euler schedulers
These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
🔥 Up to 2x faster inference with memory_efficient_attention
Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers
- Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532
To leverage it just make sure you have:
- PyTorch > 1.12
- Cuda available
- Installed the xformers library
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
revision="fp16",
torch_dtype=torch.float16,
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()
with torch.inference_mode():
sample = pipe("a small cat")
# optional: You can disable it via
# pipe.disable_xformers_memory_efficient_attention()
🚀 Much faster loading
Thanks to accelerate
, pipeline loading is much, much faster. There are two parts to it:
- First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using
low_cpu_mem_usage
(enabled by default), no initialization will be performed. - Optionally, you can also use
device_map="auto"
to automatically select the best device(s) where the pre-trained weights will be initially sent to.
In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.
As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.
This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.
🎨 RePaint
RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.
from diffusers import RePaintPipeline, RePaintScheduler
# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
original_image=original_image,
mask_image=mask_image,
num_inference_steps=250,
eta=0.0,
jump_length=10,
jump_n_sample=10,
generator=generator,
)
inpainted_image = output.images[0]
🌍 Community Pipelines
Long Prompt Weighting Stable Diffusion
The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
For a code example, see Long Prompt Weighting Stable Diffusion
Speech to Image
Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion.
For a code example, see Speech to Image
- [Examples] add speech to image pipeline example by @MikailINTech in #897
Wildcard Stable Diffusion
A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__
to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file.
For a code example, see Wildcard Stable Diffusion
- Wildcard stable diffusion pipeline by @shyamsn97 in #900
Composable Stable Diffusion
Use logic operators to do compositional generation.
For a code example, see Composable Stable Diffusion
Imagic Stable Diffusion
Image editing with Stable Diffusion.
For a code example, see Imagic Stable Diffusion
Seed Resizing
Allows to generate a larger image while keeping the content of the original image.
For a code example, see Seed Resizing
📝 Changelog
- [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
- [Stable Diffusion] Add components function by @patrickvonplaten in #889
- [PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
- [DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
- DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
- [Examples] add speech to image pipeline example by @MikailINTech in #897
- [dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
- Dreambooth class image generation: ...
v0.6.0: Finetuned Stable Diffusion inpainting
🎨 Finetuned Stable Diffusion inpainting
The first official stable diffusion checkpoint fine-tuned on inpainting has been released.
You can try it out in the official demo here
or code it up yourself 💻 :
from io import BytesIO
import torch
import PIL
import requests
from diffusers import StableDiffusionInpaintPipeline
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
revision="fp16",
torch_dtype=torch.float16,
)
pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
output = pipe(prompt=prompt, image=image, mask_image=mask_image)
image = output.images[0]
gives:
image |
mask_image |
prompt |
Output | |
---|---|---|---|---|
Face of a yellow cat, high resolution, sitting on a park bench | => |
StableDiffusionInpaintPipelineLegacy
.
The new StableDiffusionInpaintPipeline
is based on a Stable Diffusion model finetuned for the inpainting task: https://huggingface.co/runwayml/stable-diffusion-inpainting
Note
When loadingStableDiffusionInpaintPipeline
with a non-finetuned model (i.e. the one saved withdiffusers<=0.5.1
), the pipeline will default toStableDiffusionInpaintPipelineLegacy
, to maintain backward compatibility ✨
from diffusers import StableDiffusionInpaintPipeline
pipe = StableDiffusionInpaintPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
assert pipe.__class__ .__name__ == "StableDiffusionInpaintPipelineLegacy"
Context:
Why this change? When Stable Diffusion came out ~2 months ago, there were many unofficial in-painting demos using the original v1-4 checkpoint ("CompVis/stable-diffusion-v1-4"
). These demos worked reasonably well, so that we integrated an experimental StableDiffusionInpaintPipeline
class into diffusers
. Now that the official inpainting checkpoint was released: https://github.com/runwayml/stable-diffusion we decided to make this our official pipeline and move the old / hacky one to "StableDiffusionInpaintPipelineLegacy"
.
🚀 ONNX pipelines for image2image and inpainting
Thanks to the contribution by @zledas (#552) this release supports OnnxStableDiffusionImg2ImgPipeline
and OnnxStableDiffusionInpaintPipeline
optimized for CPU inference:
from diffusers import OnnxStableDiffusionImg2ImgPipeline, OnnxStableDiffusionInpaintPipeline
img_pipeline = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider"
)
inpaint_pipeline = OnnxStableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting", revision="onnx", provider="CPUExecutionProvider"
)
🌍 Community Pipelines
Two new community pipelines have been added to diffusers
🔥
Stable Diffusion Interpolation example
Interpolate the latent space of Stable Diffusion between different prompts/seeds.
For more info see stable-diffusion-videos.
For a code example, see Stable Diffusion Interpolation
Stable Diffusion Interpolation Mega
One Stable Diffusion Pipeline with all functionalities of Text2Image, Image2Image and Inpainting
For a code example, see Stable Diffusion Mega
- All in one Stable Diffusion Pipeline by @patrickvonplaten in #821
📝 Changelog
- [Community] One step unet by @patrickvonplaten in #840
- Remove unneeded use_auth_token by @osanseviero in #839
- Bump to 0.6.0.dev0 by @anton-l in #831
- Remove the last of ["sample"] by @anton-l in #842
- Fix Flax pipeline: width and height are ignored #838 by @camenduru in #848
- [DeviceMap] Make sure stable diffusion can be loaded from older trans… by @patrickvonplaten in #860
- Fix small community pipeline import bug and finish README by @patrickvonplaten in #869
- Fix training push_to_hub (unconditional image generation): models were not saved before pushing to hub by @pcuenca in #868
- Fix table in community README.md by @nateraw in #879
- Add generic inference example to community pipeline readme by @apolinario in #874
- Rename frame filename in interpolation community example by @nateraw in #881
- Add Apple M1 tests by @anton-l in #796
- Fix autoencoder test by @pcuenca in #886
- Rename StableDiffusionOnnxPipeline -> OnnxStableDiffusionPipeline by @anton-l in #887
- Fix DDIM on Windows not using int64 for timesteps by @hafriedlander in #819
- [dreambooth] allow fine-tuning text encoder by @patil-suraj in #883
- Stable Diffusion image-to-image and inpaint using onnx. by @zledas in #552
- Improve ONNX img2img numpy handling, temporarily fix the tests by @anton-l in #899
- [Stable Diffusion Inpainting] Deprecate inpainting pipeline in favor of official one by @patrickvonplaten in #903
- [Communit Pipeline] Make sure "mega" uses correct inpaint pipeline by @patrickvonplaten in #908
- Stable diffusion inpainting by @patil-suraj in #904
- ONNX supervised inpainting by @anton-l in #906
v0.5.1: Patch release
This patch release fixes an bug with Flax's NFSW safety checker in the pipeline.
#832 by @patil-suraj
v0.5.0: JAX/Flax and TPU support
🌾 JAX/Flax integration for super fast Stable Diffusion on TPUs.
We added JAX support for Stable Diffusion! You can now run Stable Diffusion on Colab TPUs (and GPUs too!) for faster inference.
Check out this TPU-ready colab for a Stable Diffusion pipeline:
And a detailed blog post on Stable Diffusion and parallelism in JAX / Flax 🤗 https://huggingface.co/blog/stable_diffusion_jax
The most used models, schedulers and pipelines have been ported to JAX/Flax, namely:
- Models:
FlaxAutoencoderKL
,FlaxUNet2DConditionModel
- Schedulers:
FlaxDDIMScheduler
,FlaxDDIMScheduler
,FlaxPNDMScheduler
- Pipelines:
FlaxStableDiffusionPipeline
Changelog:
- Implement FlaxModelMixin #493 by @mishig25 , @patil-suraj, @patrickvonplaten , @pcuenca
- Karras VE, DDIM and DDPM flax schedulers #508 by @kashif
- initial flax pndm scheduler #492 by @kashif
- FlaxDiffusionPipeline & FlaxStableDiffusionPipeline #559 by @mishig25 , @patrickvonplaten , @pcuenca
- Flax pipeline pndm #583 by @pcuenca
- Add from_pt argument in .from_pretrained #527 by @younesbelkada
- Make flax from_pretrained work with local subfolder #608 by @mishig25
🔥 DeepSpeed low-memory training
Thanks to the 🤗 accelerate
integration with DeepSpeed, a few of our training examples became even more optimized in terms of VRAM and speed:
- DreamBooth is now trainable on 8GB GPUs thanks to a contribution from @Ttl! Find out how to run it here.
- The Text2Image finetuning example is also fully compatible with DeepSpeed.
✏️ Changelog
- Revert "[v0.4.0] Temporarily remove Flax modules from the public API by @anton-l in #755)"
- Fix push_to_hub for dreambooth and textual_inversion by @YaYaB in #748
- Fix ONNX conversion script opset argument type by @justinchuby in #739
- Add final latent slice checks to SD pipeline intermediate state tests by @jamestiotio in #731
- fix(DDIM scheduler): use correct dtype for noise by @keturn in #742
- [Tests] Fix tests by @patrickvonplaten in #774
- debug an exception by @LowinLi in #638
- Clean up resnet.py file by @natolambert in #780
- add sigmoid betas by @natolambert in #777
- [Low CPU memory] + device map by @patrickvonplaten in #772
- Fix gradient checkpointing test by @patrickvonplaten in #797
- fix typo docstring in unet2d by @natolambert in #798
- DreamBooth DeepSpeed support for under 8 GB VRAM training by @Ttl in #735
- support bf16 for stable diffusion by @patil-suraj in #792
- stable diffusion fine-tuning by @patil-suraj in #356
- Flax: Trickle down
norm_num_groups
by @akash5474 in #789 - Eventually preserve this typo? :) by @spezialspezial in #804
- Fix indentation in the code example by @osanseviero in #802
- [Img2Img] Fix batch size mismatch prompts vs. init images by @patrickvonplaten in #793
- Minor package fixes by @anton-l in #809
- [Dummy imports] Better error message by @patrickvonplaten in #795
- add or fix license formatting in models directory by @natolambert in #808
- [train_text2image] Fix EMA and make it compatible with deepspeed. by @patil-suraj in #813
- Fix fine-tuning compatibility with deepspeed by @pink-red in #816
- Add diffusers version and pipeline class to the Hub UA by @anton-l in #814
- [Flax] Add test by @patrickvonplaten in #824
- update flax scheduler API by @patil-suraj in #822
- Fix dreambooth loss type with prior_preservation and fp16 by @anton-l in #826
- Fix type mismatch error, add tests for negative prompts by @anton-l in #823
- Give more customizable options for safety checker by @patrickvonplaten in #815
- Flax safety checker by @pcuenca in #825
- Align PT and Flax API - allow loading checkpoint from PyTorch configs by @patrickvonplaten in #827
v0.4.2: Patch release
This patch release allows the img2img pipeline to be run on fp16 and fixes a bug with the "mps" device.
v0.4.1: Patch release
This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".
v0.4.0 Better, faster, stronger!
🚗 Faster
We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.
On top of that, we now default to using the float16
format. It's much faster than float32
and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast
, so the resulting code is cleaner!
🔑 use_auth_token
no more
The recently released version of huggingface-hub
automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login
in your terminal and you're all set.
- pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
+ pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
We bumped huggingface-hub
version to 0.10.0
in our dependencies to achieve this.
🎈More flexible APIs
- Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in #719 for more details
Please update any custom Stable Diffusion pipelines accordingly:
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = latents * self.scheduler.sigmas[0]
+ latents = latents * self.scheduler.init_noise_sigma
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- sigma = self.scheduler.sigmas[i]
- latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
- if isinstance(self.scheduler, LMSDiscreteScheduler):
- latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
- else:
- latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
- Pipeline callbacks. As a community project (h/t @jamestiotio!),
diffusers
pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.
🛠️ More tasks
Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:
- Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
- Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
- Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!
🏃♀️ Under the hood changes to support better fine-tuning
Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers
to support general-purpose fine-tuning (coming soon!).
⚠️ Experimental: community pipelines
This is big, but it's still an experimental feature that may change in the future.
We are constantly amazed at the amount of imagination and creativity in the diffusers
community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained
will be able to load and run it. Read more in the documentation.
We can't wait to see what new tasks the community creates!
💪 Quality of life fixes
Bug fixing, improved documentation, better tests are all important to ensure diffusers
is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!
🙏 Significant community contributions
The following people have made significant contributions to the library over the last release:
- @Victarry – Add training example for DreamBooth (#554)
- @jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
- @jachiam – Allow resolutions that are not multiples of 64 (#505)
- @johnowhitaker – Adding pred_original_sample to SchedulerOutput for some samplers (#614).
- @keturn – Interesting discussions and insights on many topics.
✏️ Change list
- [Docs] Correct links by @patrickvonplaten in #432
- [Black] Update black by @patrickvonplaten in #433
- use torch.matmul instead of einsum in attnetion. by @patil-suraj in #445
- Renamed variables from single letter to better naming by @daspartho in #449
- Docs: fix installation typo by @daspartho in #453
- fix table formatting for stable diffusion pipeline doc (add blank line) by @natolambert in #471
- update expected results of slow tests by @kashif in #268
- [Flax] Make room for more frameworks by @patrickvonplaten in #494
- Fix
disable_attention_slicing
in pipelines by @pcuenca in #498 - Rename test_scheduler_outputs_equivalence in model tests. by @pcuenca in #451
- Scheduler docs update by @natolambert in #464
- Fix scheduler inference steps error with power of 3 by @natolambert in #466
- initial flax pndm schedular by @kashif in #492
- Fix vae tests for cpu and gpu by @kashif in #480
- [Docs] Add subfolder docs by @patrickvonplaten in #500
- docs: bocken doc links for relative links by @jjmachan in #504
- Removing
.float()
(autocast
in fp16 will discard this (I think)). by @Narsil in #495 - Fix MPS scheduler indexing when using
mps
by @pcuenca in #450 - [CrossAttention] add different method for sliced attention by @patil-suraj in #446
- Implement
FlaxModelMixin
by @mishig25 in #493 - Karras VE, DDIM and DDPM flax schedulers by @kashif in #508
- [UNet2DConditionModel, UNet2DModel] pass norm_num_groups to all the blocks by @patil-suraj in #442
- Add
init_weights
method toFlaxMixin
by @mishig25 in #513 - UNet Flax with FlaxModelMixin by @pcuenca in #502
- Stable diffusion text2img conversion script. by @patil-suraj in #154
- [CI] Add stalebot by @anton-l in #481
- Fix is_onnx_available by @SkyTNT in #440
- [Tests] Test attention.py by @sidthekidder in #368
- Finally fix the image-based SD tests by @anton-l in #509
- Remove the usage of numpy in up/down sample_2d by @ydshieh in #503
- Fix typos and add Typo check GitHub Action by @shirayu in #483
- Quick fix for the img2img tests by @anton-l in #530
- [Tests] Fix spatial transformer tests on GPU by @anton-l in #531
- [StableDiffusionInpaintPipeline] accept tensors for init and mask image by @patil-suraj in #439
- adding more typehints to DDIM scheduler by @vishnu-anirudh in #456
- Revert "adding more typehints to DDIM scheduler" by @patrickvonplaten in #533
- Add LMSDiscreteSchedulerTest by @sidthekidder in #467
- [Download] Smart downloading by @patrickvonplaten in #512
- [Hub] Update hub version by @patrickvonplaten in #538
- Unify offset configuration in DDIM and PNDM schedulers by @jonatanklosko in #479
- [Configuration] Better logging by @patrickvonplaten in #545
make fixup
support by @younesbelkada in #546- FlaxUNet2DConditionOutput @flax.struct.dataclass by @mishig25 in #550
- [Flax] fix Flax scheduler by @kashif in #564
- JAX/Flax safety checker by @pcuenca in #558
- Flax: ignore dtype for configuration by @pcuenca in #565
- Remove check_tf_utils to avoid an unnecessary TF import for now by @anton-l in #566
- Fix
_upsample_2d
by @ydshieh in #535 - [Flax] Add Vae for Stable Diffusion by @patrickvonplaten in #555
- [Flax] Solve problem with VAE by @patrickvonplaten in #574
- [Tests] Upload custom test artifacts by @anton-l in #572
- [Tests] Mark the ncsnpp model tests as slow by @anton-l in #575
- [examples/community] add CLIPGuidedStableDiffusion by @patil-suraj in #561
- Fix
CrossAttention._sliced_attention
by @ydshieh in #563 - Fix typos by @shirayu in #568
- Add
from_pt
argument in.from_pretrained
by @younesbelkada in #527 - [FlaxAutoencoderKL] rename weights to align with PT by @patil-suraj in #584
- Fix BaseOutput initialization from dict by @anton-l in #570
- Add the K-LMS scheduler to the inpainting pipeline + tests by @anton-l in #587
- [flax safety checker] Use
FlaxPreTrainedModel
for saving/loading by @patil-suraj in #591 - FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by @mishig25 in #559
- [Flax] Fix unet and ddim scheduler by @patrickvonplate...