Open Source Ecosystems

📣 📣 Updates

[2024.08.02] 🔥 EchoMimic is now available on huggingface with A100 GPU. Thanks Wenmeng Zhou@ModelScope.
[2024.07.25] 🔥🔥🔥 Accelerated models and pipe on Audio Driven are released. The inference speed can be improved by 10x (from ~7mins/240frames to ~50s/240frames on V100 GPU)
[2024.07.23] 🔥 EchoMimic gradio demo on modelscope is ready.
[2024.07.23] 🔥 EchoMimic gradio demo on huggingface is ready. Thanks Sylvain Filoni@fffiloni.
[2024.07.17] 🔥🔥🔥 Accelerated models and pipe on Audio + Selected Landmarks are released. The inference speed can be improved by 10x (from ~7mins/240frames to ~50s/240frames on V100 GPU)
[2024.07.14] 🔥 ComfyUI is now available. Thanks @smthemex for the contribution.
[2024.07.13] 🔥 Thanks NewGenAI for the video installation tutorial.
[2024.07.13] 🔥 We release our pose&audio driven codes and models.
[2024.07.12] 🔥 WebUI and GradioUI versions are released. We thank @greengerong @Robin021 and @O-O1024 for their contributions.
[2024.07.12] 🔥 Our paper is in public on arxiv.
[2024.07.09] 🔥 We release our audio driven codes and models.

Gallery

Audio Driven (Sing)

Audio Driven (English)

Audio Driven (Chinese)

Landmark Driven

Audio + Selected Landmark Driven

（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）

Installation

Download the Codes

  git clone https://github.com/BadToBest/EchoMimic
  cd EchoMimic

Python Environment Setup

Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7
Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
Tested Python Version: 3.8 / 3.10 / 3.11

Create conda environment (Recommended):

  conda create -n echomimic python=3.8
  conda activate echomimic

Install packages with pip

  pip install -r requirements.txt

Download ffmpeg-static

Download and decompress ffmpeg-static, then

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Download pretrained weights

git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

The pretrained_weights is organized as follows.

./pretrained_weights/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│   └── ...
├── sd-image-variations-diffusers
│   └── ...
└── audio_processor
    └── whisper_tiny.pt

In which denoising_unet.pth / reference_unet.pth / motion_module.pth / face_locator.pth are the main checkpoints of EchoMimic. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:

Audio-Drived Algo Inference

Run the python inference script:

  python -u infer_audio2vid.py
  python -u infer_audio2vid_pose.py

Audio-Drived Algo Inference On Your Own Cases

Edit the inference config file ./configs/prompts/animation.yaml, and add your own case:

test_cases:
  "path/to/your/image":
    - "path/to/your/audio"

The run the python inference script:

  python -u infer_audio2vid.py

Motion Alignment between Ref. Img. and Driven Vid.

(Firstly download the checkpoints with '_pose.pth' postfix from huggingface)

Edit driver_video and ref_image to your path in demo_motion_sync.py, then run

  python -u demo_motion_sync.py

Audio&Pose-Drived Algo Inference

Edit ./configs/prompts/animation_pose.yaml, then run

  python -u infer_audio2vid_pose.py

Pose-Drived Algo Inference

Set draw_mouse=True in line 135 of infer_audio2vid_pose.py. Edit ./configs/prompts/animation_pose.yaml, then run

  python -u infer_audio2vid_pose.py

Run the Gradio UI

Thanks to the contribution from @Robin021:


python -u webgui.py --server_port=3000

Release Plans

Status	Milestone	ETA
✅	The inference source code of the Audio-Driven algo meet everyone on GitHub	9th July, 2024
✅	Pretrained models trained on English and Mandarin Chinese to be released	9th July, 2024
✅	The inference source code of the Pose-Driven algo meet everyone on GitHub	13th July, 2024
✅	Pretrained models with better pose control to be released	13th July, 2024
✅	Accelerated models to be released	17th July, 2024
🚀	Pretrained models with better sing performance to be released	TBD
🚀	Large-Scale and High-resolution Chinese-Based Talking Head Dataset	TBD

Acknowledgements

We would like to thank the contributors to the AnimateDiff, Moore-AnimateAnyone and MuseTalk repositories, for their open research and exploration.

We are also grateful to V-Express and hallo for their outstanding work in the area of diffusion-based talking heads.

If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

Citation

If you find our work useful for your research, please consider citing the paper :

@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Star History

Badges

Extracted from project README

Related Projects

PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

29 Feb 2024 1,624

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

03 Apr 2023 4,149

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

12 Oct 2023 2,138

SMPLer-X

Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"

07 Jun 2023 928

DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

27 Nov 2023 2,449

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on di...

06 Nov 2023 2,650

MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

26 Mar 2024 2,507

hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

12 Jun 2024 9,227

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

23 Oct 2023 2,881

FollowYourPose

[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pos...

30 Mar 2023 1,231

LivePortrait

Bring portraits to life!

03 Jul 2024 12,047

champ

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

17 Mar 2024 2,474

MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

24 May 2024 2,144

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

22 Mar 2024 4,541

LiveSpeechPortraits

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

05 Aug 2021 1,194