MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
OTHER License
MuseV Zhiqiang Xia *, Zhaokang Chen*, Bin Wu, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, Corresponding Author, [email protected])
github huggingface HuggingfaceSpace project Technical report (comming soon)
20233MuseV
2023 7 Sora MuseVMuseV +
MuseTalk. MuseTalk
MuseV
``
🆕 MusePose MusePose MuseV MuseTalk
MuseV
Stable Diffusion
base_model``lora``controlnet
IPAdapter``ReferenceOnly``ReferenceNet``IPAdapterFaceID
musev_referencenet_pose
: unet
, ip_adapter
musev_referencenet_pose``musev_referencenet
mainMuseV
musev``muse_referencenet``muse_referencenet_pose
MuseV
MuseVPage
configs/tasks/example.yaml
project
MJ
pose2video
duffy
posealign
posealign
Python diffusers``controlnet_aux``mmcm
netdisk:https://www.123pan.com/s/Pf5Yjv-Bb9W3.html code: glut
docker
Python
**** Docker conda docker
docker pull anchorxia/musev:latest
docker run --gpus all -it --entrypoint /bin/bash anchorxia/musev:latest
docker conda musev
environment.yaml conda
conda env create --name musev --file ./environment.yml
pip install -r requirements.txt
Docker mmlab
pip install --no-cache-dir -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
git clone --recursive https://github.com/TMElyralab/MuseV.git
current_dir=$(pwd)
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/MMCM
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/diffusers/src
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/controlnet_aux/src
cd MuseV
MMCM
diffusers
diffusers diffuserscontrolnet_aux
controlnet_aux
git clone https://huggingface.co/TMElyralab/MuseV ./checkpoints
motion
ucf101
webvid
60K GPU resolution
$=512*512time_size=12
musev/unet
unet
GPU
$\approx 8G$musev_referencenet
unet
referencenet``IPAdapter
GPU
$\approx 12G$
unet``motion
Attention
to_k``to_v
IPAdapter
referencenet
AnimateAnyone
ip_adapter_image_proj.bin
IPAdapter
musev_referencenet_pose
musev_referencenet
referencenet
controlnet_pose
unet motion
IPAdapter
GPU
$\approx 12G$t2i/sd1.5
text2image
majicmixRealv6Fp16
t2i majicmixRealv6Fp16
fantasticmix_v10
: fantasticmix_v10
IP-Adapter/models
IPAdapter
image_encoder
ip-adapter_sd15.bin
IPAdapterip-adapter-faceid_sd15.bin
IPAdaptermusev/configs/model/T2I_all_model.py
musev/configs/model/motion_model.py
musev/configs/tasks/example.yaml
python scripts/inference/text2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet -test_data_path ./configs/tasks/example.yaml --output_dir ./output --n_batch 1 --target_datas yongen --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --time_size 12 --fps 12
test_data_path
target_datas
test_data_path
name
target_datas
sep
,
sd_model_cfg_path
T2I sdsd_model_name
sd sd_model_cfg_path
,
all
unet_model_cfg_path
unetunet_model_name
unet unet_model_cfg_path
musev/models/unet_loader.py
unet ,
all
unet_model_cfg_path
unet_name
musev/models/unet_loader.py
time_size
12
n_batch
$total_frames=n_batch * time_size + n_viscond$ 1
context_frames
time_size
> context_frame
time_size
12
`n_batch=1
time_size` = `time_size` = `context_frames` = (`12`)`context_overlap` = 0`n_batch
n_batch`referencenet``IPAdapter``IPAdapterFaceID``Facein
referencenet_model_name``referencenet
ImageClipVisionFeatureExtractor``ImageEmbExtractor
IPAdapter
vision_clip_model_path``ImageClipVisionFeatureExtractor
ip_adapter_model_name
IPAdapter
ImagePromptEmbProj
ImageEmbExtractor
ip_adapter_face_model_name``IPAdapterFaceID
IPAdapter
face_image_path
video_guidance_scale
text2image cond uncond 3.5
use_condition_image
, True
redraw_condition_image
video_negative_prompt
negative_prompt
V2
python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas dance1 --fps 12 --time_size 12
musev_text2video
video2video
test_data
video_path
rgb video
controlnet_middle_video
which2video
video_middle``pose``depth
video_middle
video
img2imge
controlnet_name
controlnet condition
dwpose,depth
pose dwpose_body_hand
video_is_middle``video_path
rgb video
controlnet_middle_video
test_data_path
test_data
video_has_condition
condtion_images video_path condition_images
video_is_middle=True``test_data
controlnet_names
mmcm
['pose', 'pose_body', 'pose_hand', 'pose_face', 'pose_hand_body', 'pose_hand_face', 'dwpose', 'dwpose_face', 'dwpose_hand', 'dwpose_body', 'dwpose_body_hand', 'canny', 'tile', 'hed', 'hed_scribble', 'depth', 'pidi', 'normal_bae', 'lineart', 'lineart_anime', 'zoe', 'sam', 'mobile_sam', 'leres', 'content', 'face_detector']
pose2video
musev_referencenet
referencenet``pose-controlnet
T2I
motion
IPAdapter
python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev_referencenet_pose --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet_pose -test_data_path ./configs/tasks/example.yaml --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas dance1 --fps 12 --time_size 12
referencenet GPU
python scripts/inference/text2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev -test_data_path ./configs/tasks/example.yaml --output_dir ./output --n_batch 1 --target_datas yongen --time_size 12 --fps 12
python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16 --unet_model_name musev -test_data_path ./configs/tasks/example.yaml --output_dir ./output --n_batch 1 --controlnet_name dwpose_body_hand --which2video "video_middle" --target_datas dance1 --fps 12 --time_size 12
MuseV gradio GUI
cd scripts/gradio
python app.py
ucf101
webvid
MuseV
MuseV
6 512*320
MuseV
MuseV
MuseV
webvid
MuseV
@article{musev,
title={MuseV: },
author={Xia, Zhiqiang and Chen, Zhaokang and Wu, Bin and Li, Chao and Hung, Kwok-Wai and Zhan, Chao and He, Yingjie and Zhou, Wenjiang},
journal={arxiv},
year={2024}
}
MIT` `insightface
IP-Adapter``ft-mse-vae`AIGC