APACHE-2.0 License
The repository contains the training code of MDef-DETR. The paper is availe on arxiv.
pip install -r requirements.txt
Distributed training is available via Slurm and submitit:
pip install submitit
The config file for pretraining is configs/pretrain.json and looks like:
{
"combine_datasets": ["flickr", "mixed"],
"combine_datasets_val": ["flickr", "gqa", "refexp"],
"coco_path": "",
"vg_img_path": "",
"flickr_img_path": "",
"refexp_ann_path": "annotations/",
"flickr_ann_path": "annotations/",
"gqa_ann_path": "annotations/",
"refexp_dataset_name": "all",
"GT_type": "separate",
"flickr_dataset_path": ""
}
flickr_img_path
to the folder containing the images.flickr_dataset_path
to the folder with annotations.vg_img_path
to point to the folder containing the images.coco_path
to the folder containing the downloaded images.flickr_ann_path
, gqa_ann_path
and refexp_ann_path
to this folder with pre-processed annotations.Alternatively, you can download the preprocessed data from the link as a single zip file and extract it under data
directory.
This command will reproduce the training of the resnet 101.
python run_with_submitit.py --dataset_config configs/pretrain.json --ngpus 8 --nodes 4 --ema --epochs 20 --lr_drop 16
If you use our work, please consider citing MDef-DETR:
@article{Maaz2021Multimodal,
title={Multi-modal Transformers Excel at Class-agnostic Object Detection},
author={Muhammad Maaz and Hanoona Rasheed and Salman Khan and Fahad Shahbaz Khan and Rao Muhammad Anwer and Ming-Hsuan Yang},
journal={ArXiv 2111.11430},
year={2021}
}
This codebase is modified from the MDETR repository. We thank them for their implementation.