MLD: Motion Latent Diffusion Models

Executing your Commands via Motion Diffusion in Latent Space

Project Page | Arxiv - CVPR 2023

Motion Latent Diffusion (MLD) is a text-to-motion and action-to-motion diffusion model. Our work achieves state-of-the-art motion quality and two orders of magnitude faster than previous diffusion models on raw motion data.

🚩 News

[2023/06/20] MotionGPT is released! A unified motion-language model. Do all your motion tasks in MotionGPT
[2023/03/08] add the script for latent space visualization and the script for the floating point operations (FLOPs)
[2023/02/28] MLD got accepted by CVPR 2023!
[2023/02/02] release action-to-motion task, please refer to the config and the pre-train model
[2023/01/18] add a detailed readme of the configuration
[2023/01/09] release no VAE config and pre-train model, you can use MLD framework to train diffusion on raw motion like MDM.
[2022/12/22] first release, demo, and training for text-to-motion
[2022/12/08] upload paper and init project, code will be released in two weeks

⚡ Quick Start

1. Conda environment

conda create python=3.9 --name mld
conda activate mld

Install the packages in requirements.txt and install PyTorch 1.12.1

pip install -r requirements.txt

We test our code on Python 3.9.12 and PyTorch 1.12.1.

2. Dependencies

Run the script to download dependencies materials:

bash prepare/download_smpl_model.sh
bash prepare/prepare_clip.sh

For Text to Motion Evaluation

bash prepare/download_t2m_evaluators.sh

3. Pre-train model

Run the script to download the pre-train model

bash prepare/download_pretrained_models.sh

4. (Optional) Download manually

Visit the Google Driver to download the previous dependencies and model.

▶️ Demo

We support text file or keyboard input, the generated motions are npy files. Please check the configs/asset.yaml for path config, TEST.FOLDER as output folder.

Then, run the following script:

python demo.py --cfg ./configs/config_mld_humanml3d.yaml --cfg_assets ./configs/assets.yaml --example ./demo/example.txt

Some parameters:

--example=./demo/example.txt: input file as text prompts
--task=text_motion: generate from the test set of dataset
--task=random_sampling: random motion sampling from noise
--replication: generate motions for same input texts multiple times
--allinone: store all generated motions in a single npy file with the shape of [num_samples, num_ replication, num_frames, num_joints, xyz]

The outputs:

npy file: the generated motions with the shape of (nframe, 22, 3)
text file: the input text prompt

💻 Train your own models

1. Prepare the datasets

Please refer to HumanML3D for text-to-motion dataset setup. We will provide instructions for other datasets soon.

2.1. Ready to train VAE model

Please first check the parameters in configs/config_vae_humanml3d.yaml, e.g. NAME,DEBUG.

Then, run the following command:

python -m train --cfg configs/config_vae_humanml3d.yaml --cfg_assets configs/assets.yaml --batch_size 64 --nodebug

2.2. Ready to train MLD model

Please update the parameters in configs/config_mld_humanml3d.yaml, e.g. NAME,DEBUG,PRETRAINED_VAE (change to your latest ckpt model path in previous step)

Then, run the following command:

python -m train --cfg configs/config_mld_humanml3d.yaml --cfg_assets configs/assets.yaml --batch_size 64 --nodebug

3. Evaluate the model

Please first put the tained model checkpoint path to TEST.CHECKPOINT in configs/config_mld_humanml3d.yaml.

Then, run the following command:

python -m test --cfg configs/config_mld_humanml3d.yaml --cfg_assets configs/assets.yaml

👀 Visualization

1. Set up blender - WIP

Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.

YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt

2. (Optional) Render rigged cylinders

Run the following command using blender:

YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D

2. Create SMPL meshes with:

python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda

This outputs:

mesh npy file: the generate SMPL vertices with the shape of (nframe, 6893, 3)
ply files: the ply mesh file for blender or meshlab

3. Render SMPL meshes

Run the following command to render SMPL using blender:

YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D

optional parameters:

--mode=video: render mp4 video
--mode=sequence: render the whole motion in a png image.

❓ FAQ

If your demo results have a severe issue on foot sliding, please take a look to the below. It could happen when self.feats2joints (use mean and std for de-normalization) is broken. https://github.com/ChenFengYe/motion-latent-diffusion/blob/af507c479d771f62a058b5b6abb51276b36d6c6d/mld/models/modeltype/mld.py#L264 https://github.com/ChenFengYe/motion-latent-diffusion/blob/5c264c31fbc7ffc047be1ce003622f1865417e8f/mld/data/get_data.py#L26-L41

GPUs. You can indicate the IDs to use all your GPUs. https://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L4
Epoch Nums. 1500~3000 epoch is enough for VAE or MLD. I suggest you use wandb(prefer) or tensorborad to check FID curve of your training.
Training Speed. 2000 epoch could cost 1 day for a single GPU, and around 12 hours for 8 GPUs. Training speed also depends on VAL_EVERY_STEPS (Validation Frequency), DataIO Speed. Your training is a little slow.
https://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L77
Data Log. Only loss print by default. After validation, more metrics of val will print. More details in wandb (prefer) or tensorborad.
Debug or not. Please use --nodebug for all your training.
VAE loading. Please load your pre-train VAE correctly for the MLD diffusion training.
FID. FID of validation will drop to 0.5~1 after 1500 epochs for both VAE and MLD training. By default, validation is on test split...https://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L30

python -m scripts.flops --cfg configs/your_config.yaml

python -m scripts.tsne --cfg configs/your_config.yaml

Note: This only support action-to-motion models for now.

Details of configuration

Citation

If you find our code or paper helps, please consider citing:

@inproceedings{chen2023executing,
  title={Executing your Commands via Motion Diffusion in Latent Space},
  author={Chen, Xin and Jiang, Biao and Liu, Wen and Huang, Zilong and Fu, Bin and Chen, Tao and Yu, Gang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18000--18010},
  year={2023}
}

Acknowledgments

Thanks to TEMOS, ACTOR, HumanML3D and joints2smpl, our code is partially borrowing from them.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Badges

Extracted from project README