MuseV English

MuseV Zhiqiang Xia *, Zhaokang Chen*, Bin Wu, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, Corresponding Author, [email protected])

github huggingface HuggingfaceSpace project Technical report (comming soon)

20233MuseV 2023 7 Sora MuseVMuseV +

MuseTalk. MuseTalk MuseV ``

🆕 MusePose MusePose MuseV MuseTalk

MuseV

Stable Diffusion base_model``lora``controlnet
IPAdapter``ReferenceOnly``ReferenceNet``IPAdapterFaceID

musev_referencenet_pose: unet, ip_adapter musev_referencenet_pose``musev_referencenetmain

[2024327] MuseV musev``muse_referencenet``muse_referencenet_pose
[03/30/2024] huggingface space gui .

MuseV MuseVPage

configs/tasks/example.yaml project

controlnetMJ
2MuseV

pose2video

duffy posealign

MuseTalk

`talk```V

posealign

Python diffusers``controlnet_aux``mmcm

ComfyUI

windows

netdisk:https://www.123pan.com/s/Pf5Yjv-Bb9W3.html code: glut

docker Python

Python

**** Docker conda docker

1 Docker

Docker

docker pull anchorxia/musev:latest

Docker

docker run --gpus all -it --entrypoint /bin/bash anchorxia/musev:latest

docker conda musev

2 conda

environment.yaml conda

conda env create --name musev --file ./environment.yml

3 pip requirements

pip install -r requirements.txt

openmmlab

Docker mmlab

pip install --no-cache-dir -U openmim 
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0"

git clone --recursive https://github.com/TMElyralab/MuseV.git

PYTHONPATH

current_dir=$(pwd)
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/MMCM
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/diffusers/src
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/controlnet_aux/src
cd MuseV

MMCM
diffusers diffusers diffusers
controlnet_aux controlnet_aux

git clone https://huggingface.co/TMElyralab/MuseV ./checkpoints

motion ucf101 webvid 60K GPU resolution $=512*512time_size=12
- musev/unet unet GPU $\approx 8G$
- musev_referencenet unet referencenet``IPAdapter GPU $\approx 12G$
  - unet``motion Attention to_k``to_v IPAdapter
  - referencenet AnimateAnyone
  - ip_adapter_image_proj.bin IPAdapter
- musev_referencenet_pose musev_referencenet referencenet controlnet_pose unet motion IPAdapter GPU $\approx 12G$
t2i/sd1.5text2image
- majicmixRealv6Fp16 t2i majicmixRealv6Fp16
- fantasticmix_v10: fantasticmix_v10
IP-Adapter/models IPAdapter
- image_encoder
- ip-adapter_sd15.bin IPAdapter
- ip-adapter-faceid_sd15.bin IPAdapter

T2I SD musev/configs/model/T2I_all_model.py
Unet musev/configs/model/motion_model.py
musev/configs/tasks/example.yaml

musev_referencenet

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder  --time_size 12 --fps 12

test_data_path
target_datas test_data_path name target_datas sep ,
sd_model_cfg_pathT2I sd
sd_model_namesd sd_model_cfg_path , all
unet_model_cfg_path unet
unet_model_nameunet unet_model_cfg_path musev/models/unet_loader.py unet , all unet_model_cfg_path unet_name musev/models/unet_loader.py
time_size 12
n_batch$total_frames=n_batch * time_size + n_viscond$ 1
context_frames time_size > context_frame time_size 12

`n_batch=1time_size` =
`time_size` = `context_frames` = (`12`)`context_overlap` = 0`n_batchn_batch`

referencenet``IPAdapter``IPAdapterFaceID``Facein

referencenet_model_name``referencenet
ImageClipVisionFeatureExtractor``ImageEmbExtractor IPAdapter
vision_clip_model_path``ImageClipVisionFeatureExtractor
ip_adapter_model_name IPAdapter ImagePromptEmbProj ImageEmbExtractor
ip_adapter_face_model_name``IPAdapterFaceID IPAdapter face_image_path

video_guidance_scale text2image cond uncond 3.5
use_condition_image, True
redraw_condition_image
video_negative_prompt negative_prompt V2

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12

musev_text2video video2video

test_data video_path rgb video controlnet_middle_video

which2video video_middle``pose``depth video_middle video img2imge
controlnet_name controlnet condition dwpose,depth pose dwpose_body_hand
video_is_middle``video_path rgb video controlnet_middle_video test_data_path test_data
video_has_conditioncondtion_images video_path condition_images video_is_middle=True``test_data

controlnet_names mmcm

['pose', 'pose_body', 'pose_hand', 'pose_face', 'pose_hand_body', 'pose_hand_face', 'dwpose', 'dwpose_face', 'dwpose_hand', 'dwpose_body', 'dwpose_body_hand', 'canny', 'tile', 'hed', 'hed_scribble', 'depth', 'pidi', 'normal_bae', 'lineart', 'lineart_anime', 'zoe', 'sam', 'mobile_sam', 'leres', 'content', 'face_detector']

musev_referencenet_pose

pose2video musev_referencenet referencenet``pose-controlnet T2I motion IPAdapter

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev_referencenet_pose --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet_pose    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

musev

referencenet GPU

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --time_size 12 --fps 12

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev    -test_data_path ./configs/tasks/example.yaml --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

Gradio

MuseV gradio GUI

cd scripts/gradio
python app.py

MuseV TuneAVideo diffusers Moore-AnimateAnyone animatediff IP-Adapter AnimateAnyone VideoFusion insightface
MuseV ucf101 webvid

MuseV

t2i
MuseV 6 512*320 MuseV MuseV MuseV
webvid
referencenet IP-Adapter
MuseV

@article{musev,
  title={MuseV: },
  author={Xia, Zhiqiang and Chen, Zhaokang and Wu, Bin and Li, Chao and Hung, Kwok-Wai and Zhan, Chao and He, Yingjie and Zhou, Wenjiang},
  journal={arxiv},
  year={2024}
}

/

```MuseV MIT`
``
`insightfaceIP-Adapter``ft-mse-vae`
AIGC

Related Projects

EchoMimic

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

03 Jul 2024 2,543

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

12 Oct 2023 2,138

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...

22 Nov 2023 5,641

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...

30 Jun 2023 1,075

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on di...

06 Nov 2023 2,650

CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

26 Feb 2020 17,384

T2I-Adapter

15 Feb 2023 3,136

VoiceprintRecognition-PaddlePaddle

本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型，同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fban...

29 Apr 2020 218

Moore-AnimateAnyone

Character Animation (AnimateAnyone, Face Reenactment)

12 Jan 2024 3,115

MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

26 Mar 2024 2,507

ailia-models

The collection of pre-trained, state-of-the-art AI models for ailia SDK

07 Sep 2019 1,799

MusePose

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

24 May 2024 2,144

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...

23 Feb 2024 1,061

StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

28 Nov 2023 9,514

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

22 Mar 2024 4,541