Open Source Ecosystems

English |

Introduction

Welcome to DAMO-YOLO! It is a fast and accurate object detection method, which is developed by TinyML Team from Alibaba DAMO Data Analytics and Intelligence Lab. And it achieves a higher performance than state-of-the-art YOLO series. DAMO-YOLO is extend from YOLO but with some new techs, including Neural Architecture Search (NAS) backbones, efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. For more details, please refer to our Arxiv Report. Moreover, here you can find not only powerful models, but also highly efficient training strategies and complete tools from training to deployment.

Updates

[2023/04/12: We release DAMO-YOLO v0.3.1!]
- Add 701-categories DAMO-YOLO-S model, which cover more application scenarios and serve as high-quality pre-training model to improve the performance of downstream tasks
- Upgrade the DAMO-YOLO-Nano series model, which achieves 32.3/38.2/40.5 mAP with only 1.56/3.69/6.04 Flops, and runs in real-time at 4.08/5.05/6.69ms using Intel-CPU.
- Add DAMO-YOLO-L model, which achieves 51.9 mAP with 7.95ms latency using T4-GPU.
[2023/03/13: We release DAMO-YOLO v0.3.0!]
- Release DAMO-YOLO-Nano, which achieves 35.1 mAP with only 3.02GFlops.
- Upgrade the optimizer builder, edits the optimizer config, you are able to use any optimizer supported by Pytorch.
- Upgrade the data loading pipeline and training parameters, leading to significant improvements of DAMO-YOLO models, e.g., the mAP of DAMO-YOLO-T/S/M increased from 43.0/46.8/50.0 to 43.6/47.7/50.2 respectively.
[2023/02/15: Baseline for The 3rd Anti-UAV Challenge.]
- Welcome to join the 3rd Anti-UAV Challenge on CVPR2023. The Challenge provides baseline models trained by DAMO-YOLO, which can be found on DamoYolo_Anti-UAV-23_S.
[2023/01/07: We release DAMO-YOLO v0.2.1!]
- Add TensorRT Int8 Quantization Tutorial, achieves 19% speed up with only 0.3% accuracy loss.
- Add general demo tools, support TensorRT/Onnx/Torch based vidoe/image/camera inference.
- Add more industry application models, including human detection, helmet detection, facemask detection and cigarette detection.
- Add third-party resources, including DAMO-YOLO Code Interpretation, Practical Example for Finetuning on Custom Dataset.
[2022/12/15: We release DAMO-YOLO v0.1.1!]
- Add a detailed Custom Dataset Finetune Tutorial.
- The stuck problem caused by no-label data (e.g., ISSUE#30) is solved. Feel free to contact us, we are 24h stand by.

Web Demo

DAMO-YOLO-T, DAMO-YOLO-S, DAMO-YOLO-M is integrated into ModelScope. Training is supported on ModelScope now! Come and try DAMO-YOLO with free GPU resources provided by ModelScope.

Model Zoo

General Models

Model	size	mAPval0.5:0.95	Latency T4TRT-FP16-BS1	FLOPs(G)	Params(M)	AliYun Download	Google Download
DAMO-YOLO-T	640	42.0	2.78	18.1	8.5	torch,onnx	--
DAMO-YOLO-T*	640	43.6	2.78	18.1	8.5	torch,onnx	--
DAMO-YOLO-S	640	46.0	3.83	37.8	16.3	torch,onnx	--
DAMO-YOLO-S*	640	47.7	3.83	37.8	16.3	torch,onnx	--
DAMO-YOLO-M	640	49.2	5.62	61.8	28.2	torch,onnx	--
DAMO-YOLO-M*	640	50.2	5.62	61.8	28.2	torch,onnx	--
DAMO-YOLO-L	640	50.8	7.95	97.3	42.1	torch,onnx	--
DAMO-YOLO-L*	640	51.9	7.95	97.3	42.1	torch,onnx	--

Model	size	mAPval0.5:0.95	Latency T4TRT-FP16-BS1	FLOPs(G)	Params(M)	AliYun Download	Google Download
DAMO-YOLO-T	640	41.8	2.78	18.1	8.5	torch,onnx	torch,onnx
DAMO-YOLO-T*	640	43.0	2.78	18.1	8.5	torch,onnx	torch,onnx
DAMO-YOLO-S	640	45.6	3.83	37.8	16.3	torch,onnx	torch,onnx
DAMO-YOLO-S*	640	46.8	3.83	37.8	16.3	torch,onnx	torch,onnx
DAMO-YOLO-M	640	48.7	5.62	61.8	28.2	torch,onnx	torch,onnx
DAMO-YOLO-M*	640	50.0	5.62	61.8	28.2	torch,onnx	torch,onnx

We report the mAP of models on COCO2017 validation set, with multi-class NMS.
The latency in this table is measured without post-processing(NMS).
* denotes the model trained with distillation.
We use S as teacher to distill T, and M as teacher to distill S, ans L as teacher to distill M, while L is distilled by it self.

Light Models

Model	size	mAPval0.5:0.95	Latency(ms) CPU OpenVino-Intel8163	FLOPs(G)	Params(M)	AliYun Download	Google Download
DAMO-YOLO-Ns	416	32.3	4.08	1.56	1.41	torch,onnx	--
DAMO-YOLO-Nm	416	38.2	5.05	3.69	2.71	torch,onnx	--
DAMO-YOLO-Nl	416	40.5	6.69	6.04	5.69	torch,onnx	--

We report the mAP of models on COCO2017 validation set, with multi-class NMS.
The latency in this table is measured without post-processing, following picodet.

The latency is evaluated based on OpenVINO-2022.3.0, using commands below:

# onnx export, enable --benchmark to ignore postprocess
python tools/converter.py -f configs/damoyolo_tinynasL18_Ns.py -c ../damoyolo_tinynasL18_Ns.pth --batch_size 1  --img_size 416 --benchmark
# model transform
mo --input_model damoyolo_tinynasL18_Ns.onnx --data_type FP16
# latency benchmark
./benchmark_app -m damoyolo_tinynasL18_Ns.xml -i ./assets/dog.jpg -api sync -d CPU -b 1 -hint latency

701 categories DAMO-YOLO Model

We provide DAMO-YOLO-S model with 701 categories for general object detection, which has been trained on a large dataset including COCO, Objects365 and OpenImage. This model can also serve as a pre-trained model for fine-tuning in downstream tasks, enabling you to achieve better performance with ease.

Pretrained Model	Downstream Task	mAPval0.5:0.95	AliYun Download	Google Download
80-categories-DAMO-YOLO-S	VisDrone	24.6	torch,onnx	-
701-categories-DAMO-YOLO-S	VisDrone	26.6	torch,onnx	-

Note: The downloadable model is a pretrained model with 701 categories datasets. We demonstrate the VisDrone results to show that our pretrained model can enhance the performance of downstream tasks.

Quick Start

Step1. Install DAMO-YOLO.

git clone https://github.com/tinyvision/damo-yolo.git
cd DAMO-YOLO/
conda create -n DAMO-YOLO python=3.7 -y
conda activate DAMO-YOLO
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
export PYTHONPATH=$PWD:$PYTHONPATH

Step2. Install pycocotools.

pip install cython;
pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI # for Linux
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI # for Windows

Step1. Download a pretrained torch, onnx or tensorRT engine from the benchmark table, e.g., damoyolo_tinynasL25_S.pth, damoyolo_tinynasL25_S.onnx, damoyolo_tinynasL25_S.trt.

Step2. Use -f(config filename) to specify your detector's config, --path to specify input data path, image/video/camera are supported. For example:

# torch engine with image
python tools/demo.py image -f ./configs/damoyolo_tinynasL25_S.py --engine ./damoyolo_tinynasL25_S.pth --conf 0.6 --infer_size 640 640 --device cuda --path ./assets/dog.jpg

# onnx engine with video
python tools/demo.py video -f ./configs/damoyolo_tinynasL25_S.py --engine ./damoyolo_tinynasL25_S.onnx --conf 0.6 --infer_size 640 640 --device cuda --path your_video.mp4

# tensorRT engine with camera
python tools/demo.py camera -f ./configs/damoyolo_tinynasL25_S.py --engine ./damoyolo_tinynasL25_S.trt --conf 0.6 --infer_size 640 640 --device cuda --camid 0

Step1. Prepare COCO dataset

cd <DAMO-YOLO Home>
ln -s /path/to/your/coco ./datasets/coco

Step 2. Reproduce our results on COCO by specifying -f(config filename)

python -m torch.distributed.launch --nproc_per_node=8 tools/train.py -f configs/damoyolo_tinynasL25_S.py

Please refer to custom dataset tutorial for details.

python -m torch.distributed.launch --nproc_per_node=8 tools/eval.py -f configs/damoyolo_tinynasL25_S.py --ckpt /path/to/your/damoyolo_tinynasL25_S.pth

Step2. After the searching process completed, you can replace the structure text in configs with it. Finally, you can get your own custom ResNet-like or CSPNet-like backbone after setting the backbone name to TinyNAS_res or TinyNAS_csp. Please notice the difference of out_indices between TinyNAS_res and TinyNAS_csp.

structure = self.read_structure('tinynas_customize.txt')
TinyNAS = { 'name'='TinyNAS_res', # ResNet-like Tinynas backbone
            'out_indices': (2,4,5)}
TinyNAS = { 'name'='TinyNAS_csp', # CSPNet-like Tinynas backbone
            'out_indices': (2,3,4)}

Deploy

Step1. Install ONNX.

pip install onnx==1.8.1
pip install onnxruntime==1.8.0
pip install onnx-simplifier==0.3.5

Step2. Install CUDACuDNNTensorRT and pyCUDA

2.1 CUDA

wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
export PATH=$PATH:/usr/local/cuda-10.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64
source ~/.bashrc

2.2 CuDNN

sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

2.3 TensorRT

cd TensorRT-7.2.1.6/python
pip install tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-7.2.1.6/lib

2.4 pycuda

pip install pycuda==2022.1

Now we support trt_int8 quantization, you can specify trt_type as int8 to export the int8 tensorRT engine. You can also try partial quantization to achieve a good compromise between accuracy and latency. Refer to partial_quantization for more details.

Step.1 convert torch model to onnx or trt engine, and the output file would be generated in ./deploy. end2end means to export trt with nms. trt_eval means to evaluate the exported trt engine on coco_val dataset after the export compelete.

# onnx export 
python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c damoyolo_tinynasL25_S.pth --batch_size 1 --img_size 640

# trt export
python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c damoyolo_tinynasL25_S.pth --batch_size 1 --img_size 640 --trt --end2end --trt_eval

Step.2 trt engine evaluation on coco_val dataset. end2end means to using trt_with_nms to evaluation.

python tools/trt_eval.py -f configs/damoyolo_tinynasL25_S.py -trt deploy/damoyolo_tinynasL25_S_end2end_fp16_bs1.trt --batch_size 1 --img_size 640 --end2end

Step.3 onnx or trt engine inference demo and appoint test image/video by --path. end2end means to using trt_with_nms to inference.

# onnx inference
python tools/demo.py image -f ./configs/damoyolo_tinynasL25_S.py --engine ./damoyolo_tinynasL25_S.onnx --conf 0.6 --infer_size 640 640 --device cuda --path ./assets/dog.jpg

# trt inference
python tools/demo.py image -f ./configs/damoyolo_tinynasL25_S.py --engine ./deploy/damoyolo_tinynasL25_S_end2end_fp16_bs1.trt --conf 0.6 --infer_size 640 640 --device cuda --path ./assets/dog.jpg --end2end

Industry Application Models:

We provide DAMO-YOLO models for applications in real scenarios, which are listed as follows. More powerful models are coming, please stay tuned.

Human Detection	Helmet Detection	Head Detection	Smartphone Detectioin

Facemask Detection	Cigarette Detection	Traffic Sign Detection	NFL-helmet detection

Third Party Resources

In order to promote communication among DAMO-YOLO users, we collect third-party resources in this section. If you have original content about DAMO-YOLO, please feel free to contact us at [email protected].

DAMO-YOLO Overview: slides( | English), videos( | English).
DAMO-YOLO Code Interpretation
Practical Example for Finetuning on Custom Dataset

Cite DAMO-YOLO

If you use DAMO-YOLO in your research, please cite our work by using the following BibTeX entry:

 @article{damoyolo,
   title={DAMO-YOLO: A Report on Real-Time Object Detection Design},
   author={Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang and Xiuyu Sun},
   journal={arXiv preprint arXiv:2211.15444v2},
   year={2022},
 }

 @inproceedings{sun2022mae,
   title={Mae-det: Revisiting maximum entropy principle in zero-shot nas for efficient object detection},
   author={Sun, Zhenhong and Lin, Ming and Sun, Xiuyu and Tan, Zhiyu and Li, Hao and Jin, Rong},
   booktitle={International Conference on Machine Learning},
   pages={20810--20826},
   year={2022},
   organization={PMLR}
 }

@inproceedings{jiang2022giraffedet,
  title={GiraffeDet: A Heavy-Neck Paradigm for Object Detection},
  author={yiqi jiang and Zhiyu Tan and Junyan Wang and Xiuyu Sun and Ming Lin and Hao Li},
  booktitle={International Conference on Learning Representations},
  year={2022},
}

Badges

Extracted from project README

Related Projects

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and ...

01 Aug 2022 1,980

KAIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, ...

15 Dec 2019 2,905

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

27 Mar 2023 8,170

yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Inf...

18 Feb 2024 8,892

yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-ti...

06 Jul 2022 13,259

yolov5-face

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931) ECCV Workshop...

26 Apr 2021 1,953

ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose E...

27 Apr 2022 1,332

OpenSTL

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

27 Jul 2022 739

ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

27 Aug 2021 4,678

deep-person-reid

Torchreid: Deep learning person re-identification in PyTorch.

11 Mar 2018 4,256

YOLOv6_pro

Make it easier for yolov6 to change the network structure

18 Nov 2022 68