Release v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community · huggingface/diffusers

❤️ PyTorch + Accelerate

⚠️ The PyTorch pipelines now require accelerate for improved model loading times!
Install Diffusers with pip install --upgrade diffusers[torch] to get everything in a single command.

🍎 Apple Silicon support with PyTorch 1.13

PyTorch and Apple have been working on improving mps support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

Mac computer with Apple silicon (M1/M2) hardware.
macOS 12.6 or later (13.0 or later recommended, as support is even better).
arm64 version of Python.
PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")

# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

prompt = "a photo of an astronaut riding a horse on mars"

# First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)

image = pipe(prompt).images[0]
image.save("astronaut.png")

Continuous Integration

Our automated tests now include a full battery of tests on the mps device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in the documentation.

💃 Dance Diffusion

diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!

[Dance Diffusion] Add dance diffusion by @patrickvonplaten #803

Try it out to generate some random music:

from diffusers import DiffusionPipeline
import scipy

model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")

audio = pipeline(audio_length_in_s=4.0).audios[0]

# To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())

🎉 Euler schedulers

These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

k-diffusion-euler by @hlky #1019

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]

from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]

🔥 Up to 2x faster inference with `memory_efficient_attention`

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers

Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532

To leverage it just make sure you have:

PyTorch > 1.12
Cuda available
Installed the xformers library

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")

# optional: You can disable it via
# pipe.disable_xformers_memory_efficient_attention()

🚀 Much faster loading

Thanks to accelerate, pipeline loading is much, much faster. There are two parts to it:

First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using low_cpu_mem_usage (enabled by default), no initialization will be performed.
Optionally, you can also use device_map="auto" to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.

from diffusers import RePaintPipeline, RePaintScheduler

# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
    original_image=original_image,
    mask_image=mask_image,
    num_inference_steps=250,
    eta=0.0,
    jump_length=10,
    jump_n_sample=10,
    generator=generator,
)
inpainted_image = output.images[0]

🌍 Community Pipelines

Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
For a code example, see Long Prompt Weighting Stable Diffusion

[Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907

Speech to Image

Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion.
For a code example, see Speech to Image

[Examples] add speech to image pipeline example by @MikailINTech in #897

Wildcard Stable Diffusion

A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__ to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file.
For a code example, see Wildcard Stable Diffusion

Wildcard stable diffusion pipeline by @shyamsn97 in #900

Composable Stable Diffusion

Use logic operators to do compositional generation.
For a code example, see Composable Stable Diffusion

Add Composable diffusion to community pipeline examples by @MarkRich in #951

Imagic Stable Diffusion

Image editing with Stable Diffusion.
For a code example, see Imagic Stable Diffusion

Add imagic to community pipelines by @MarkRich in #958

Seed Resizing

Allows to generate a larger image while keeping the content of the original image.
For a code example, see Seed Resizing

Add seed resizing to community pipelines by @MarkRich in #1011

📝 Changelog

[Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
[Stable Diffusion] Add components function by @patrickvonplaten in #889
[PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
[DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
[Examples] add speech to image pipeline example by @MikailINTech in #897
[dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
Dreambooth class image generation: using unique names to avoid overwriting existing image by @leszekhanusz in #847
fix test_components by @patil-suraj in #928
Fix Compatibility with Nvidia NGC Containers by @tasercake in #919
[Community Pipelines] Fix pad_tokens_and_weights in lpw_stable_diffusion by @SkyTNT in #925
Bump the version to 0.7.0.dev0 by @anton-l in #912
Introduce the copy mechanism by @anton-l in #924
[Tests] Move stable diffusion into their own files by @patrickvonplaten in #936
[Flax] dont warn for bf16 weights by @patil-suraj in #923
Support LMSDiscreteScheduler in LDMPipeline by @mkshing in #891
Wildcard stable diffusion pipeline by @shyamsn97 in #900
[MPS] fix mps failing tests by @kashif in #934
fix a small typo in pipeline_ddpm.py by @chenguolin in #948
Reorganize pipeline tests by @anton-l in #963
v1-5 docs updates by @apolinario in #921
add community pipeline docs; add minimal text to some empty doc pages by @natolambert in #930
Fix typo: torch_type -> torch_dtype by @pcuenca in #972
add num_inference_steps arg to DDPM by @tmabraham in #935
Add Composable diffusion to community pipeline examples by @MarkRich in #951
[Flax] added broadcast_to_shape_from_left helper and Scheduler tests by @kashif in #864
[Tests] Fix mps reproducibility issue when running with pytest-xdist by @anton-l in #976
mps changes for PyTorch 1.13 by @pcuenca in #926
[Onnx] support half-precision and fix bugs for onnx pipelines by @SkyTNT in #932
[Dance Diffusion] Add dance diffusion by @patrickvonplaten in #803
[Dance Diffusion] FP16 by @patrickvonplaten in #980
[Dance Diffusion] Better naming by @patrickvonplaten in #981
Fix typo in documentation title by @echarlaix in #975
Add --pretrained_model_name_revision option to train_dreambooth.py by @shirayu in #933
Do not use torch.float64 on the mps device by @pcuenca in #942
CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by @patrickvonplaten in #991
fix a bug in the new version by @xiaohu2015 in #957
Fix typos by @shirayu in #978
Add missing import by @juliensimon in #979
minimal stable diffusion GPU memory usage with accelerate hooks by @piEsposito in #850
[inpaint pipeline] fix bug for multiple prompts inputs by @xiaohu2015 in #959
Enable multi-process DataLoader for dreambooth by @skirsten in #950
Small modification to enable usage by external scripts by @briancw in #956
[Flax] Add Textual Inversion by @duongna21 in #880
Continuation of #942: additional float64 failure by @pcuenca in #996
fix dreambooth script. by @patil-suraj in #1017
[Accelerate model loading] Fix meta device and super low memory usage by @patrickvonplaten in #1016
[Flax] Add finetune Stable Diffusion by @duongna21 in #999
[DreamBooth] Set train mode for text encoder by @duongna21 in #1012
[Flax] Add DreamBooth by @duongna21 in #1001
Deprecate init_git_repo, refactor train_unconditional.py by @anton-l in #1022
update readme for flax examples by @patil-suraj in #1026
Probably nicer to specify dependency on tensorboard in the training example by @lukovnikov in #998
Add --dataloader_num_workers to the DDPM training example by @anton-l in #1027
Document sequential CPU offload method on Stable Diffusion pipeline by @piEsposito in #1024
Support grayscale images in numpy_to_pil by @anton-l in #1025
[Flax SD finetune] Fix dtype by @duongna21 in #1038
fix F.interpolate() for large batch sizes by @NouamaneTazi in #1006
[Tests] Improve unet / vae tests by @patrickvonplaten in #1018
[Tests] Speed up slow tests by @patrickvonplaten in #1040
Fix some failing tests by @patrickvonplaten in #1041
[Tests] Better prints by @patrickvonplaten in #1043
[Tests] no random latents anymore by @patrickvonplaten in #1045
Update training and fine-tuning docs by @pcuenca in #1020
Fix speedup ratio in fp16.mdx by @mwbyeon in #837
clean incomplete pages by @natolambert in #1008
Add seed resizing to community pipelines by @MarkRich in #1011
Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by @pcuenca in #1048
Experimental: allow fp16 in mps by @pcuenca in #961
Move safety detection to model call in Flax safety checker by @jonatanklosko in #1023
Fix pipelines user_agent, ignore CI requests by @anton-l in #1058
[GitBot] Automatically close issues after inactivitiy by @patrickvonplaten in #1079
Allow safety_checker to be None when using CPU offload by @pcuenca in #1078
k-diffusion-euler by @hlky in #1019
[Better scheduler docs] Improve usage examples of schedulers by @patrickvonplaten in #890
[Tests] Fix slow tests by @patrickvonplaten in #1087
Remove nn sequential by @patrickvonplaten in #1086
Remove some unused parameter in CrossAttnUpBlock2D by @LaurentMazare in #1034
Add imagic to community pipelines by @MarkRich in #958
Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR in #532
[docs] add euler scheduler in docs, how to use differnet schedulers by @patil-suraj in #1089
Integration tests precision improvement for inpainting by @Lewington-pitsos in #1052
lpw_stable_diffusion: Add is_cancelled_callback by @irgolic in #1053
Rename latent by @patrickvonplaten in #1102
fix typo in examples dreambooth README.md by @jorahn in #1073
fix model card url in text inversion readme. by @patil-suraj in #1103
[CI] Framework and hardware-specific CI tests by @anton-l in #997
Fix a small typo of a variable name by @omihub777 in #1063
Fix tests for equivalence of DDIM and DDPM pipelines by @sgrigory in #1069
Fix padding in dreambooth by @shirayu in #1030
[Flax] time embedding by @kashif in #1081
Training to predict x0 in training example by @lukovnikov in #1031
[Loading] Ignore unneeded files by @patrickvonplaten in #1107
Fix hub-dependent tests for PRs by @anton-l in #1119
Allow saving None pipeline components by @anton-l in #1118
feat: add repaint by @Revist in #974
Continuation of #1035 by @pcuenca in #1120
VQ-diffusion by @williamberman in #658

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

❤️ PyTorch + Accelerate

🍎 Apple Silicon support with PyTorch 1.13

Requirements

Memory efficient generation

Continuous Integration

💃 Dance Diffusion

🎉 Euler schedulers

🔥 Up to 2x faster inference with `memory_efficient_attention`

🚀 Much faster loading

🎨 RePaint

🌍 Community Pipelines

Long Prompt Weighting Stable Diffusion

Speech to Image

Wildcard Stable Diffusion

Composable Stable Diffusion

Imagic Stable Diffusion

Seed Resizing

📝 Changelog

Contributors

v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

❤️ PyTorch + Accelerate

🍎 Apple Silicon support with PyTorch 1.13

Requirements

Memory efficient generation

Continuous Integration

💃 Dance Diffusion

🎉 Euler schedulers

🔥 Up to 2x faster inference with memory_efficient_attention

🚀 Much faster loading

🎨 RePaint

🌍 Community Pipelines

Long Prompt Weighting Stable Diffusion

Speech to Image

Wildcard Stable Diffusion

Composable Stable Diffusion

Imagic Stable Diffusion

Seed Resizing

📝 Changelog

Contributors

🔥 Up to 2x faster inference with `memory_efficient_attention`