Kolors

Kolors Team

APACHE-2.0 License

Stars
3.7K

Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

Contents

News

Open-source Plan

  • Kolors (Text-to-Image Model)
    • Inference
    • Checkpoints
    • IP-Adapter
    • ControlNet (Canny, Depth)
    • Inpainting
    • IP-Adapter-FaceID
    • LoRA
    • ControlNet (Pose)
  • ComfyUI
  • Gradio
  • Diffusers

Introduction

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this technical report.

Evaluation

We have collected a comprehensive text-to-image evaluation dataset named KolorsPrompts to compare Kolors with other state-of-the-art open models and closed-source models. KolorsPrompts includes over 1,000 prompts across 14 catagories and 12 evaluation dimensions. The evaluation process incorporates both human and machine assessments. In relevant benchmark evaluations, Kolors demonstrated highly competitive performance, achieving industry-leading standards.

Human Assessment

For the human evaluation, we invited 50 imagery experts to conduct comparative evaluations of the results generated by different models. The experts rated the generated images based on three criteria: visual appeal, text faithfulness, and overall satisfaction. In the evaluation, Kolors achieved the highest overall satisfaction score and significantly led in visual appeal compared to other models.

Model Average Overall Satisfaction Average Visual Appeal Average Text Faithfulness
Adobe-Firefly 3.03 3.46 3.84
Stable Diffusion 3 3.26 3.50 4.20
DALL-E 3 3.32 3.54 4.22
Midjourney-v5 3.32 3.68 4.02
Playground-v2.5 3.37 3.73 4.04
Midjourney-v6 3.58 3.92 4.18
Kolors 3.59 3.99 4.17

All model results are tested with the April 2024 product versions

Machine Assessment

We used MPS (Multi-dimensional Human Preference Score) on KolorsPrompts as the evaluation metric for machine assessment. Kolors achieved the highest MPS score, which is consistent with the results of the human evaluations.

Models Overall MPS
Adobe-Firefly 8.5
Stable Diffusion 3 8.9
DALL-E 3 9.0
Midjourney-v5 9.4
Playground-v2.5 9.8
Midjourney-v6 10.2
Kolors 10.3

For more experimental results and details, please refer to our technical report.

Visualization

  • High-quality Portrait
  • Chinese Elements Generation
  • Complex Semantic Understanding
  • Text Rendering

The visualized case prompts mentioned above can be accessed here.

Usage

Requirements

  • Python 3.8 or later
  • PyTorch 1.13.1 or later
  • Transformers 4.26.1 or later
  • Recommended: CUDA 11.7 or later
  1. Repository Cloning and Dependency Installation
apt-get install git-lfs
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
conda create --name kolors python=3.8
conda activate kolors
pip install -r requirements.txt
python3 setup.py install
  1. Weights downloadlink
huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors

or

git lfs clone https://huggingface.co/Kwai-Kolors/Kolors weights/Kolors
  1. Inference
python3 scripts/sample.py ""
# The image will be saved to "scripts/outputs/sample_text.jpg"
  1. Web demo
python3 scripts/sampleui.py

Using with Diffusers

Make sure you upgrade to the latest version(0.30.0.dev0) of diffusers:

git clone https://github.com/huggingface/diffusers
cd diffusers
python3 setup.py install

Notes:

  • The pipeline uses the EulerDiscreteScheduler by default. We recommend using this scheduler with guidance scale=5.0 and num_inference_steps=50.
  • The pipeline also supports the EDMDPMSolverMultistepScheduler. guidance scale=5.0 and num_inference_steps=25 is a good default for this scheduler.
  • In addition to Text-to-Image, KolorsImg2ImgPipeline also supports Image-to-Image.

And then you can run:

import torch
from diffusers import KolorsPipeline
pipe = KolorsPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers", 
    torch_dtype=torch.float16, 
    variant="fp16"
).to("cuda")
prompt = '""'
image = pipe(
    prompt=prompt,
    negative_prompt="",
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator(pipe.device).manual_seed(66),
).images[0]
image.show()

IP-Adapter-Plus

We provide IP-Adapter-Plus weights and inference code, detailed in the ipadapter.

# Weights download
huggingface-cli download --resume-download Kwai-Kolors/Kolors-IP-Adapter-Plus --local-dir weights/Kolors-IP-Adapter-Plus
# Inference
python3 ipadapter/sample_ipadapter_plus.py ./ipadapter/asset/test_ip.jpg "T"

python3 ipadapter/sample_ipadapter_plus.py ./ipadapter/asset/test_ip2.png ""

# The image will be saved to "scripts/outputs/"

ControlNet

We provide three ControlNet weights and inference code, detailed in the controlnet.

# Weights download

# Canny - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Canny --local-dir weights/Kolors-ControlNet-Canny

# Depth - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Depth --local-dir weights/Kolors-ControlNet-Depth

# Pose - ControlNet
huggingface-cli download --resume-download Kwai-Kolors/Kolors-ControlNet-Pose --local-dir weights/Kolors-ControlNet-Pose

If you intend to utilize the depth estimation network, please make sure to download its corresponding model weights.

huggingface-cli download lllyasviel/Annotators ./dpt_hybrid-midas-501f0c75.pt --local-dir ./controlnet/annotator/ckpts

Thanks to DWPose, you can utilize the pose estimation network. Please download the Pose model dw-ll_ucoco_384.onnx (baidu, google) and Det model yolox_l.onnx (baidu, google). Then please put them into controlnet/annotator/ckpts/.

# Inference

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_1.png 8k4K Canny

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_2.png 8K Depth

python ./controlnet/sample_controlNet.py ./controlnet/assets/woman_3.png 8k4K Pose

# The image will be saved to "controlnet/outputs/"

Inpainting

We provide Inpainting weights and inference code, detailed in the inpainting.

# Weights download
huggingface-cli download --resume-download Kwai-Kolors/Kolors-Inpainting --local-dir weights/Kolors-Inpainting
# Inference
python3 inpainting/sample_inpainting.py ./inpainting/asset/3.png ./inpainting/asset/3_mask.png 32k

python3 inpainting/sample_inpainting.py ./inpainting/asset/4.png ./inpainting/asset/4_mask.png 

# The image will be saved to "scripts/outputs/"

IP-Adapter-FaceID-Plus

We provide IP-Adapter-FaceID-Plus weights and inference code, detailed in the ipadapter_FaceID.

# Weights download
huggingface-cli download --resume-download Kwai-Kolors/Kolors-IP-Adapter-FaceID-Plus --local-dir weights/Kolors-IP-Adapter-FaceID-Plus
# Inference
python ipadapter_FaceID/sample_ipadapter_faceid_plus.py ./ipadapter_FaceID/assets/image1.png ""

python ipadapter_FaceID/sample_ipadapter_faceid_plus.py ./ipadapter_FaceID/assets/image2.png ",, , XT4, , , "

# The image will be saved to "scripts/outputs/"

Dreambooth-LoRA

We provide LoRA training and inference code, detailed in the Dreambooth-LoRA.

# Training:
sh train.sh
# Inference
python infer_dreambooth.py "ktxl"

License & Citation & Acknowledgments

License

Kolors weights are fully open for academic research. If you intend to use the Kolors model or its derivatives for commercial purposes under the licensing terms and conditions, please send the questionnaire to [email protected] to register with the licensor. If the monthly active users of all products or services made available by or for Licensee does not exceed 300 million monthly active users in the preceding calendar month, Your registration with the Licensor will be deemed to have obtained the corresponding business license; If, the monthly active users of all products or services made available by or for Licensee is greater than 300 million monthly active users in the preceding calendar month, You must request a license from Licensor, which the Licensor may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until We otherwise expressly grants You such rights.

We open-source Kolors to promote the development of large text-to-image models in collaboration with the open-source community. The code of this project is open-sourced under the Apache-2.0 license. We sincerely urge all developers and users to strictly adhere to the open-source license, avoiding the use of the open-source model, code, and its derivatives for any purposes that may harm the country and society or for any services not evaluated and registered for safety. Note that despite our best efforts to ensure the compliance, accuracy, and safety of the data during training, due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, we cannot guarantee the accuracy and safety of the output content, and the model is susceptible to misleading. This project does not assume any legal responsibility for any data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized due to the use of the open-source model and code.

Citation

If you find our work helpful, please cite it!

@article{kolors,
  title={Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis},
  author={Kolors Team},
  journal={arXiv preprint},
  year={2024}
}

Acknowledgments

  • Thanks to Diffusers for providing the codebase.
  • Thanks to ChatGLM3 for providing the powerful Chinese language model.

Contact Us

If you want to leave a message for our R&D team and product team, feel free to join our WeChat group. You can also contact us via email ([email protected]).

Badges
Extracted from project README
Star History Chart
Related Projects