MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

OTHER License

Stars
1.8K

MuseV English

MuseV Zhiqiang Xia *, Zhaokang Chen*, Bin Wu, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, Corresponding Author, [email protected])

github huggingface HuggingfaceSpace project Technical report (comming soon)

20233MuseV 2023 7 Sora MuseVMuseV +

MuseTalk. MuseTalk MuseV ``

🆕 MusePose MusePose MuseV MuseTalk

MuseV

  1. Stable Diffusion base_model``lora``controlnet
  2. IPAdapter``ReferenceOnly``ReferenceNet``IPAdapterFaceID

  1. musev_referencenet_pose: unet, ip_adapter musev_referencenet_pose``musev_referencenetmain

  • [2024327] MuseV musev``muse_referencenet``muse_referencenet_pose
  • [03/30/2024] huggingface space gui .

MuseV MuseVPage

configs/tasks/example.yaml project

  1. controlnetMJ
  2. 2MuseV

pose2video

duffy posealign

MuseTalk

`talk```V

  • posealign

Python diffusers``controlnet_aux``mmcm

ComfyUI

windows

netdisk:https://www.123pan.com/s/Pf5Yjv-Bb9W3.html code: glut

docker Python

Python

**** Docker conda docker

1 Docker

  1. Docker
docker pull anchorxia/musev:latest
  1. Docker
docker run --gpus all -it --entrypoint /bin/bash anchorxia/musev:latest

docker conda musev

2 conda

environment.yaml conda

conda env create --name musev --file ./environment.yml

3 pip requirements

pip install -r requirements.txt

openmmlab

Docker mmlab

pip install --no-cache-dir -U openmim 
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0" 

git clone --recursive https://github.com/TMElyralab/MuseV.git

PYTHONPATH

current_dir=$(pwd)
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/MMCM
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/diffusers/src
export PYTHONPATH=${PYTHONPATH}:${current_dir}/MuseV/controlnet_aux/src
cd MuseV
  1. MMCM
  2. diffusers diffusers diffusers
  3. controlnet_aux controlnet_aux

git clone https://huggingface.co/TMElyralab/MuseV ./checkpoints
  • motion ucf101 webvid 60K GPU resolution $=512*512time_size=12
    • musev/unet unet GPU $\approx 8G$
    • musev_referencenet unet referencenet``IPAdapter GPU $\approx 12G$
      • unet``motion Attention to_k``to_v IPAdapter
      • referencenet AnimateAnyone
      • ip_adapter_image_proj.bin IPAdapter
    • musev_referencenet_pose musev_referencenet referencenet controlnet_pose unet motion IPAdapter GPU $\approx 12G$
  • t2i/sd1.5text2image
  • IP-Adapter/models IPAdapter
    • image_encoder
    • ip-adapter_sd15.bin IPAdapter
    • ip-adapter-faceid_sd15.bin IPAdapter

  • T2I SD musev/configs/model/T2I_all_model.py
  • Unet musev/configs/model/motion_model.py
  • musev/configs/tasks/example.yaml

musev_referencenet

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev_referencenet --referencenet_model_name musev_referencenet --ip_adapter_model_name musev_referencenet   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder  --time_size 12 --fps 12  

  • test_data_path
  • target_datas test_data_path name target_datas sep ,
  • sd_model_cfg_pathT2I sd
  • sd_model_namesd sd_model_cfg_path , all
  • unet_model_cfg_path unet
  • unet_model_nameunet unet_model_cfg_path musev/models/unet_loader.py unet , all unet_model_cfg_path unet_name musev/models/unet_loader.py
  • time_size 12
  • n_batch$total_frames=n_batch * time_size + n_viscond$ 1
  • context_frames time_size > context_frame time_size 12

  1. `n_batch=1time_size` =
  2. `time_size` = `context_frames` = (`12`)`context_overlap` = 0`n_batchn_batch`

referencenet``IPAdapter``IPAdapterFaceID``Facein

  • referencenet_model_name``referencenet
  • ImageClipVisionFeatureExtractor``ImageEmbExtractor IPAdapter
  • vision_clip_model_path``ImageClipVisionFeatureExtractor
  • ip_adapter_model_name IPAdapter ImagePromptEmbProj ImageEmbExtractor
  • ip_adapter_face_model_name``IPAdapterFaceID IPAdapter face_image_path

  • video_guidance_scale text2image cond uncond 3.5
  • use_condition_image, True
  • redraw_condition_image
  • video_negative_prompt negative_prompt V2

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev_referencenet --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas dance1 --fps 12 --time_size 12

musev_text2video video2video

  1. test_data video_path rgb video controlnet_middle_video
  • which2video video_middle``pose``depth video_middle video img2imge
  • controlnet_name controlnet condition dwpose,depth pose dwpose_body_hand
  • video_is_middle``video_path rgb video controlnet_middle_video test_data_path test_data
  • video_has_conditioncondtion_images video_path condition_images video_is_middle=True``test_data

controlnet_names mmcm

['pose', 'pose_body', 'pose_hand', 'pose_face', 'pose_hand_body', 'pose_hand_face', 'dwpose', 'dwpose_face', 'dwpose_hand', 'dwpose_body', 'dwpose_body_hand', 'canny', 'tile', 'hed', 'hed_scribble', 'depth', 'pidi', 'normal_bae', 'lineart', 'lineart_anime', 'zoe', 'sam', 'mobile_sam', 'leres', 'content', 'face_detector']

musev_referencenet_pose

pose2video musev_referencenet referencenet``pose-controlnet T2I motion IPAdapter

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev_referencenet_pose --referencenet_model_name   musev_referencenet --ip_adapter_model_name musev_referencenet_pose    -test_data_path ./configs/tasks/example.yaml    --vision_clip_extractor_class_name ImageClipVisionFeatureExtractor --vision_clip_model_path ./checkpoints/IP-Adapter/models/image_encoder      --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

musev

referencenet GPU

python scripts/inference/text2video.py   --sd_model_name majicmixRealv6Fp16   --unet_model_name musev   -test_data_path ./configs/tasks/example.yaml  --output_dir ./output  --n_batch 1  --target_datas yongen  --time_size 12 --fps 12

python scripts/inference/video2video.py --sd_model_name majicmixRealv6Fp16  --unet_model_name musev    -test_data_path ./configs/tasks/example.yaml --output_dir ./output  --n_batch 1 --controlnet_name dwpose_body_hand  --which2video "video_middle"  --target_datas  dance1   --fps 12 --time_size 12

Gradio

MuseV gradio GUI

cd scripts/gradio
python app.py

  1. MuseV TuneAVideodiffusersMoore-AnimateAnyoneanimatediffIP-AdapterAnimateAnyoneVideoFusion insightface
  2. MuseV ucf101 webvid

MuseV

  1. t2i
  2. MuseV 6 512*320 MuseV MuseV MuseV
  3. webvid
  4. referencenet IP-Adapter
  5. MuseV

@article{musev,
  title={MuseV: },
  author={Xia, Zhiqiang and Chen, Zhaokang and Wu, Bin and Li, Chao and Hung, Kwok-Wai and Zhan, Chao and He, Yingjie and Zhou, Wenjiang},
  journal={arxiv},
  year={2024}
}

/

  1. ```MuseV MIT`
  2. ``
  3. `insightfaceIP-Adapter``ft-mse-vae`
  4. AIGC
Related Projects