π΄ββοΈ ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection (MM 2020)
GPL-3.0 License
This repository maintains the official implementation of the paper ConsNet: Learning Consistency Graph for ZeroβShot HumanβObject Interaction Detection by Ye Liu, Junsong Yuan and Chang Wen Chen, which has been accepted by ACM Multimedia 2020.
The ConsNet package could be installed directly from PyPI or manually from source for different uses. Please refer to the following environmental settings that we use.
You may install ConsNet from PyPI and import it in your own project as a Python package. This library implements several useful functionalities including Pair IoU, Pair NMS and unified APIs for HICO-DET dataset.
Simply run the following command to install the latest version of ConsNet.
pip install consnet
For more details about consnet.api
, please refer to our documentation.
By installing ConsNet from source, you may access the full capabilities of this project, including pooling object features, constructing the consistency graph and benchmarking the ConsNet model.
git clone https://github.com/yeliudev/ConsNet.git
cd ConsNet
pip install -e .[full]
We pre-extract the visual features of all the humans and objects in the dataset and save them for training as well as testing. These features are also used to construct the consistency graph. Please refer to our paper for more details about feature extraction and data sampling.
ROOT='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo'
ELMO='2x4096_512_2048cnn_2xhighway_5.5B'
# Download object detector checkpoints
wget https://huggingface.co/yeliudev/ConsNet/resolve/main/faster_rcnn_r50_fpn_3x_coco-26df6f6b.pth
wget https://huggingface.co/yeliudev/ConsNet/resolve/main/faster_rcnn_r50_fpn_20e_hico_det-77b91312.pth
# Download ELMo options and weights
wget ${ROOT}/${ELMO}/elmo_${ELMO}_options.json
wget ${ROOT}/${ELMO}/elmo_${ELMO}_weights.hdf5
ConsNet
βββ configs
βββ consnet
βββ tools
βββ checkpoints
β βββ faster_rcnn_r50_fpn_3x_coco-26df6f6b.pth
β βββ faster_rcnn_r50_fpn_20e_hico_det-77b91312.pth
β βββ elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json
β βββ elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5
βββ data
β βββ hico_20160224_det
β βββ anno_bbox.mat
β βββ images
β βββ train2015
β βββ test2015
βββ README.md
βββ setup.py
βββ Β·Β·Β·
data/hico_det/annotations
.python tools/convert_anno.py
data/hico_det
.python tools/build_dataset.py --checkpoint <path-to-checkpoint>
Run the following command to train a model using specified configs.
python tools/launch.py --config <path-to-config>
Run the following command to test a model and evaluate results.
python tools/launch.py --config <path-to-config> --checkpoint <path-to-checkpoint> --eval
We provide multiple HICO-DET pre-trained models here. All the models are trained using a single NVIDIA Tesla V100-SXM2 GPU and are evaluated under the default
metric of HICO-DET dataset.
Note that: Type UC
, UO
, UA
and GT
represent unseen action-object combination, unseen object, unseen action and ground truth scenarios respectively.
Thanks to the modulized implementation based on NNCore, this project is highly customizable with a number of replaceable modules. You may play with the hyperparameters in configs
or construct your own HOI detection pipeline by replacing the dataset, detector, embedder, etc. Please check the documentation of NNCore for more details about customizing the engine and modules.
If you find this project useful for your research, please kindly cite our paper.
@inproceedings{liu2020consnet,
title={ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection},
author={Liu, Ye and Yuan, Junsong and Chen, Chang Wen},
booktitle={Proceedings of The ACM International Conference on Multimedia (MM)},
pages={4235--4243},
year={2020}
}