Let us democratise high-resolution generation! (CVPR 2024)
Code release for "DemoFusion: Democratising High-Resolution Image Generation With No 💰"
Abstract: High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.
pipeline_demofusion_sdxl_controlnet
! The local Gradio Demo is also available.pipeline_demofusion_sdxl
now! The local Gradio Demo is also available.pipeline_demofusion_sdxl
is released.view_batch_size
(int
, defaults to 16):stride
(int
, defaults to 64):cosine_scale_1
(float
, defaults to 3):cosine_scale_2
(float
, defaults to 1):cosine_scale_3
(float
, defaults to 1):sigma
(float
, defaults to 1):multi_decoder
(bool
, defaults to True):show_image
(bool
, defaults to False):conda create -n demofusion python=3.9
conda activate demofusion
pip install -r requirements.txt
pipeline_demofusion_sdxl.py
and run it as follows. A use case can be found in demo.ipynb
.from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline
import torch
model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified."
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"
images = pipe(prompt, negative_prompt=negative_prompt,
height=3072, width=3072, view_batch_size=16, stride=64,
num_inference_steps=50, guidance_scale=7.5,
cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8,
multi_decoder=True, show_image=True
)
for i, image in enumerate(images):
image.save('image_' + str(i) + '.png')
multi_decoder=False
, which can make the decoding process faster.cmd
git clone "https://github.com/PRIS-CV/DemoFusion"
cd DemoFusion
python -m venv venv
venv\Scripts\activate
pip install -U "xformers==0.0.22.post7+cu118" --index-url https://download.pytorch.org/whl/cu118
pip install "diffusers==0.21.4" "matplotlib==3.8.2" "transformers==4.35.2" "accelerate==0.25.0"
demo_lowvram.py
.python
from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline
import torch
from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16, vae=vae)
pipe = pipe.to("cuda")
prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified."
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"
images = pipe(prompt, negative_prompt=negative_prompt,
height=2048, width=2048, view_batch_size=4, stride=64,
num_inference_steps=40, guidance_scale=7.5,
cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8,
multi_decoder=True, show_image=False, lowvram=True
)
for i, image in enumerate(images):
image.save('image_' + str(i) + '.png')
gradio
and gradio_imageslider
.python gradio_demo.py
! Better Interaction and Presentation!gradio
and gradio_imageslider
.python gradio_demo_img2img.py
.gradio
and gradio_imageslider
.python gradio_demo.py
.python gradio_demo_img2img.py
.If you find this paper useful in your research, please consider citing:
@inproceedings{du2024demofusion,
title={DemoFusion: Democratising High-Resolution Image Generation With No \$\$\$},
author={Du, Ruoyi and Chang, Dongliang and Hospedales, Timothy and Song, Yi-Zhe and Ma, Zhanyu},
booktitle={CVPR},
year={2024}
}