OpenMMLab Detection Toolbox and Benchmark
APACHE-2.0 License
Bot releases are hidden (Show)
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community.
Detail: https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino
Published by hhaAndroid about 1 year ago
v3.2.0 was released in 12/10/2023:
1. Detection Transformer SOTA Model Collection
(1) Supported four updated and stronger SOTA Transformer models: DDQ, CO-DETR, AlignDETR, and H-DINO.
(2) Based on CO-DETR, MMDet released a model with a COCO performance of 64.1 mAP.
(3) Algorithms such as DINO support AMP/Checkpoint/FrozenBN
, which can effectively reduce memory usage.
2. Comprehensive Performance Comparison between CNN and Transformer
RF100 consists of a dataset collection of 100 real-world datasets, including 7 domains. It can be used to assess the performance differences of Transformer models like DINO and CNN-based algorithms under different scenarios and data volumes. Users can utilize this benchmark to quickly evaluate the robustness of their algorithms in various scenarios.
3. Support for GLIP and Grounding DINO fine-tuning, the only algorithm library that supports Grounding DINO fine-tuning
The Grounding DINO algorithm in MMDet is the only library that supports fine-tuning. Its performance is one point higher than the official version, and of course, GLIP also outperforms the official version.
We also provide a detailed process for training and evaluating Grounding DINO on custom datasets. Everyone is welcome to give it a try.
Model | Backbone | Style | COCO mAP | Official COCO mAP |
---|---|---|---|---|
Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
Grounding DINO-B | Swin-B | Finetune | 59.7 | |
Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |
4. Support for the open-vocabulary detection algorithm Detic and multi-dataset joint training.
5. Training detection models using FSDP and DeepSpeed.
ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
---|---|---|---|---|---|---|
1 | 49 (A100) | 0.9 | ||||
2 | √ | 39 (A100) | 1.2 | |||
3 | √ | 33 (A100) | 1.1 | |||
4 | √ | √ | 25 (A100) | 1.3 | ||
5 | √ | √ | 18 | 2.2 | ||
6 | √ | √ | √ | 13 | 1.6 | |
7 | √ | √ | √ | 14 | 2.9 | |
8 | √ | √ | √ | √ | 8.5 | 2.4 |
6. Support for the V3Det dataset, a large-scale detection dataset with over 13,000 categories.
v3.2.0 版本已经在 2023.10.12 发布:
1. 检测 Transformer SOTA 模型大合集
(1) 支持了 DDQ、CO-DETR、AlignDETR 和 H-DINO 4 个更新更强的 SOTA Transformer 模型
(2) 基于 CO-DETR, MMDet 中发布了 COCO 性能为 64.1 mAP 的模型
(3) DINO 等算法支持 AMP/Checkpoint/FrozenBN,可以有效降低显存
2. 提供了全面的 CNN 和 Transformer 的性能对比
RF100 是由 100 个现实收集的数据集组成,包括 7 个域,可以验证 DINO 等 Transformer 模型和 CNN 类算法在不同场景不同数据量下的性能差异。用户可以用这个 Benchmark 快速验证自己的算法在不同场景下的鲁棒性。
3. 支持了 GLIP 和 Grounding DINO 微调,全网唯一支持 Grounding DINO 微调
MMDet 中的 Grounding DINO 是全网唯一支持微调的算法库,且性能高于官方 1 个点,当然 GLIP 也比官方高。
我们还提供了详细的 Grounding DINO 在自定义数据集上训练评估的流程,欢迎大家试用。
Model | Backbone | Style | COCO mAP | Official COCO mAP |
---|---|---|---|---|
Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
Grounding DINO-B | Swin-B | Finetune | 59.7 | |
Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |
4. 支持开放词汇检测算法 Detic 并提供多数据集联合训练可能
5. 轻松使用 FSDP 和 DeepSpeed 训练检测模型
ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
---|---|---|---|---|---|---|
1 | 49 (A100) | 0.9 | ||||
2 | √ | 39 (A100) | 1.2 | |||
3 | √ | 33 (A100) | 1.1 | |||
4 | √ | √ | 25 (A100) | 1.3 | ||
5 | √ | √ | 18 | 2.2 | ||
6 | √ | √ | √ | 13 | 1.6 | |
7 | √ | √ | √ | 14 | 2.9 | |
8 | √ | √ | √ | √ | 8.5 | 2.4 |
6. 支持了 V3Det 1.3w+ 类别的超大词汇检测数据集
Published by hhaAndroid over 1 year ago
s multimodal vision algorithms continue to evolve, MMDetection has also supported such algorithms. This section demonstrates how to use the demo and eval scripts corresponding to multimodal algorithms using the GLIP algorithm and model as the example. Moreover, MMDetection integrated a gradio_demo project, which allows developers to quickly play with all image input tasks in MMDetection on their local devices. Check the document for more details.
Please first make sure that you have the correct dependencies installed:
# if source
pip install -r requirements/multimodal.txt
# if wheel
mim install mmdet[multimodal]
MMDetection has already implemented GLIP algorithms and provided the weights, you can download directly from urls:
cd mmdetection
wget https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth
Once the model is successfully downloaded, you can use the demo/image_demo.py
script to run the inference.
python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts bench
Demo result will be similar to this:
If users would like to detect multiple targets, please declare them in the format of xx . xx .
after the --texts
.
python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'bench . car .'
And the result will be like this one:
You can also use a sentence as the input prompt for the --texts
field, for example:
python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'There are a lot of cars here.'
The result will be similar to this:
The GLIP implementation in MMDetection does not have any performance degradation, our benchmark is as follows:
Model | official mAP | mmdet mAP |
---|---|---|
glip_A_Swin_T_O365.yaml | 42.9 | 43.0 |
glip_Swin_T_O365.yaml | 44.9 | 44.9 |
glip_Swin_L.yaml | 51.4 | 51.3 |
Users can use the test script we provided to run evaluation as well. Here is a basic example:
# 1 gpu
python tools/test.py configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth
# 8 GPU
./tools/dist_test.sh configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth 8
The result will be similar to this:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.594
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.466
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.300
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.477
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.534
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.473
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.690
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.789
# if source
pip install -r requirements/multimodal.txt
# if wheel
mim install mmdet[multimodal]
For convenience, you can download the weights to the mmdetection
root dir
wget https://download.openmmlab.com/mmdetection/v3.0/xdecoder/xdecoder_focalt_last_novg.pt
wget https://download.openmmlab.com/mmdetection/v3.0/xdecoder/xdecoder_focalt_best_openseg.pt
The above two weights are directly copied from the official website without any modification. The specific source is https://github.com/microsoft/X-Decoder
For convenience of demonstration, please download the folder and place it in the root directory of mmdetection.
(1) Open Vocabulary Semantic Segmentation
cd projects/XDecoder
python demo.py ../../images/animals.png configs/xdecoder-tiny_zeroshot_open-vocab-semseg_coco.py --weights ../../xdecoder_focalt_last_novg.pt --texts zebra.giraffe
(2) Open Vocabulary Instance Segmentation
cd projects/XDecoder
python demo.py ../../images/owls.jpeg configs/xdecoder-tiny_zeroshot_open-vocab-instance_coco.py --weights ../../xdecoder_focalt_last_novg.pt --texts owl
(3) Open Vocabulary Panoptic Segmentation
cd projects/XDecoder
python demo.py ../../images/street.jpg configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_coco.py --weights ../../xdecoder_focalt_last_novg.pt --text car.person --stuff-text tree.sky
(4) Referring Expression Segmentation
cd projects/XDecoder
python demo.py ../../images/fruit.jpg configs/xdecoder-tiny_zeroshot_open-vocab-ref-seg_refcocog.py --weights ../../xdecoder_focalt_last_novg.pt --text "The larger watermelon. The front white flower. White tea pot."
(5) Image Caption
cd projects/XDecoder
python demo.py ../../images/penguin.jpeg configs/xdecoder-tiny_zeroshot_caption_coco2014.py --weights ../../xdecoder_focalt_last_novg.pt
(6) Referring Expression Image Caption
cd projects/XDecoder
python demo.py ../../images/fruit.jpg configs/xdecoder-tiny_zeroshot_ref-caption.py --weights ../../xdecoder_focalt_last_novg.pt --text 'White tea pot'
(7) Text Image Region Retrieval
cd projects/XDecoder
python demo.py ../../images/coco configs/xdecoder-tiny_zeroshot_text-image-retrieval.py --weights ../../xdecoder_focalt_last_novg.pt --text 'pizza on the plate'
The image that best matches the given text is ../../images/coco/000.jpg and probability is 0.998
We have also prepared a gradio program in the projects/gradio_demo
directory, which you can run interactively all the inference supported by mmdetection in your browser.
Prepare your dataset according to the docs.
Test Command
Since semantic segmentation is a pixel-level task, we don't need to use a threshold to filter out low-confidence predictions. So we set model.test_cfg.use_thr_for_mc=False
in the test command.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_ade20k.py xdecoder_focalt_best_openseg.pt 8 --cfg-options model.test_cfg.use_thr_for_mc=False
Model | mIoU | mIOU(official) | Config |
---|---|---|---|
xdecoder_focalt_best_openseg.pt |
25.24 | 25.13 | config |
Prepare your dataset according to the docs.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_ade20k.py xdecoder_focalt_best_openseg.pt 8
Model | mIoU | mIOU(official) | Config |
---|---|---|---|
xdecoder_focalt_best_openseg.pt |
10.1 | 10.1 | config |
Prepare your dataset according to the docs.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_ade20k.py xdecoder_focalt_best_openseg.pt 8
Model | mIoU | mIOU(official) | Config |
---|---|---|---|
xdecoder_focalt_best_openseg.pt |
19.11 | 18.97 | config |
Prepare your dataset according to the docs of (2) use panoptic dataset
part.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-semseg_coco.py xdecoder_focalt_last_novg.pt 8 --cfg-options model.test_cfg.use_thr_for_mc=False
Model | mIOU | mIOU(official) | Config |
---|---|---|---|
xdecoder-tiny_zeroshot_open-vocab-semseg_coco |
62.1 | 62.1 | config |
Prepare your dataset according to the docs.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-instance_coco.py xdecoder_focalt_last_novg.pt 8
Model | Mask mAP | Mask mAP(official) | Config |
---|---|---|---|
xdecoder-tiny_zeroshot_open-vocab-instance_coco |
39.8 | 39.7 | config |
Prepare your dataset according to the docs.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-panoptic_coco.py xdecoder_focalt_last_novg.pt 8
Model | PQ | PQ(official) | Config |
---|---|---|---|
xdecoder-tiny_zeroshot_open-vocab-panoptic_coco |
51.42 | 51.16 | config |
Prepare your dataset according to the docs.
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_open-vocab-ref-seg_refcocog.py xdecoder_focalt_last_novg.pt 8 --cfg-options test_dataloader.dataset.split='val'
Model | text mode | cIoU | cIOU(official) | Config |
---|---|---|---|---|
xdecoder_focalt_last_novg.pt |
select first | 58.8415 | 57.85 | config |
xdecoder_focalt_last_novg.pt |
original | 60.0321 | - | config |
xdecoder_focalt_last_novg.pt |
concat | 60.3551 | - | config |
Note:
Resize
to (1024, 512), the result will be 57.69
.text mode
is the RefCoCoDataset
parameter in MMDetection, it determines the texts loaded to the data list. It can be set to select_first
, original
, concat
and random
.
select_first
: select the first text in the text list as the description to an instance.original
: use all texts in the text list as the description to an instance.concat
: concatenate all texts in the text list as the description to an instance.random
: randomly select one text in the text list as the description to an instance, usually used for training.Prepare your dataset according to the docs.
Before testing, you need to install jdk 1.8, otherwise it will prompt that java does not exist during the evaluation process
./tools/dist_test.sh projects/XDecoder/configs/xdecoder-tiny_zeroshot_caption_coco2014.py xdecoder_focalt_last_novg.pt 8
Model | BLEU-4 | CIDER | Config |
---|---|---|---|
xdecoder-tiny_zeroshot_caption_coco2014 |
35.26 | 116.81 | config |
Please refer to https://github.com/open-mmlab/mmdetection/blob/dev-3.x/projects/gradio_demo/README.md for details.
A total of 30 developers contributed to this release.
Thanks @jjjkkkjjj @lovelykite, @minato-ellie, @freepoet, @wufan-tb, @yalibian, @keyakiluo, @gihanjayatilaka, @i-aki-y, @xin-li-67, @RangeKing, @JingweiZhang12, @MambaWong, @lucianovk, @tall-josh, @xiuqhou, @jamiechoi1995, @YQisme, @yechenzhi, @bjzhb666, @xiexinch, @jamiechoi1995, @yarkable, @Renzhihan, @nijkah, @amaizr, @Lum1104, @zwhus, @Czm369, @hhaAndroid
Published by hhaAndroid over 1 year ago
We have released the official version of MMDetection v3.0.0
RTMDetIns
prior generator device error (#9964)img_shape
in data pipeline (#9966)solov2_r50_fpn_ms-3x_coco.py
config error (#10030)common/ms_3x_coco-instance.py
config error (#10056)data_root
in CocoOccludedSeparatedMetric
to fix bug (#9969)A total of 19 developers contributed to this release.
Thanks @IRONICBo, @vansin, @RangeKing, @Ghlerrix, @okotaku, @JosonChan1998, @zgzhengSEU, @bobo0810, @yechenzhi, @Zheng-LinXiao, @LYMDLUT, @yarkable, @xiejiajiannb, @chhluo, @BIGWangYuDong, @RangiLyu, @zwhus, @hhaAndroid, @ZwwWayne
Published by ZwwWayne over 1 year ago
customize_runtime.md
(#9797)WIDERFace SSD
loss for Nan problem (#9734)A total of 4 developers contributed to this release.
Thanks @co63oc, @Ginray, @vansin, @RangiLyu
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.28.1...v2.28.2
Published by ZwwWayne over 1 year ago
Projects
DetInferencer
for inference, Test Time Augmentation, and automatically importing modules from registryProjects
(#9619)Projects
(#9639, #9768)Projects
(#9645)Projects
(#9645)DetInferencer
for inference (#9561)use_depthwise
in RTMDet (#9624)albumentations
augmentation post process with masks (#9551)LoadPanopticAnnotations
bug (#9703)isort
CI (#9680)MultiImageMixDataset
(#9764)sklearn
(#9725)Project
(#9599)github
with gitee
in .pre-commit-config-zh-cn.yaml
file (#9586)isort
in .pre-commit-config.yaml
file (#9701)2.0.0rc4
for dev-3.x
(#9695)DarknetBottleneck
(#9591)non_blocking
parameters (#9723)finetune.md
and inference.md
(#9578)A total of 27 developers contributed to this release.
Thanks @JosonChan1998, @RangeKing, @NoFish-528, @likyoo, @Xiangxu-0103, @137208, @PeterH0323, @tianleiSHI, @wufan-tb, @lyviva, @zwhus, @jshilong, @Li-Qingyun, @sanbuphy, @zylo117, @triple-Mu, @KeiChiTse, @LYMDLUT, @nijkah, @chg0901, @DanShouzhu, @zytx121, @vansin, @BIGWangYuDong, @hhaAndroid, @RangiLyu, @ZwwWayne
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc5...v3.0.0rc6
Published by RangiLyu over 1 year ago
A total of 4 developers contributed to this release.
Thanks @triple-Mu, @i-aki-y, @twmht, @RangiLyu
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.28.0...v2.28.1
Published by ZwwWayne over 1 year ago
-
to --format-only
in documentation.DeformableDETRHead
(#9607)A total of 11 developers contributed to this release.
Thanks @eantono, @akstt, @@lpizzinidev, @RangiLyu, @kbumsik, @tianleiSHI, @nijkah, @BIGWangYuDong, @wangjiangben-hw, @@jamiechoi1995, @ZwwWayne
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.27.0...v2.28.0
Published by RangiLyu almost 2 years ago
A total of 12 developers contributed to this release.
Thanks @Min-Sheng, @gasvn, @lzyhha, @jbwang1997, @zachcoleman, @chenyuwang814, @MilkClouds, @Fizzez, @boahc077, @apatsekin, @zytx121, @DonggeunYu
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.26.0...v2.27.0
Published by ZwwWayne almost 2 years ago
batch_size
is greater than 1 in inference (#9400)analyze_logs.py
to plot mAP and calculate train time correctly (#9409)PAFPN
(#9450)DeformableDETRHead
object has no attribute loss_single
(#9477)analyze_results
(#9380)builder.py
(#9479)(width, height)
order (#9324).pre-commit-config-zh-cn.yaml
file (#9388)FocalLoss
and QualityFocalLoss
to allow different kinds of targets (#9481)setup.cfg
(#9370)[0, 1]
(#9391)faq.md
(#9396)get_started
(#9480)useful_tools.md
and useful_hooks.md
(#9453)bfp
and channel_mapper
(#9410)A total of 20 developers contributed to this release.
Thanks @liuyanyi, @RangeKing, @lihua199710, @MambaWong, @sanbuphy, @Xiangxu-0103, @twmht, @JunyaoHu, @Chan-Sun, @tianleiSHI, @zytx121, @kitecats, @QJC123654, @JosonChan1998, @lvhan028, @Czm369, @BIGWangYuDong, @RangiLyu, @hhaAndroid, @ZwwWayne
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc4...v3.0.0rc5
Published by ZwwWayne almost 2 years ago
projects/
folder, which will be a place for some experimental models/features.projects
projects/
folder, which will be a place for some experimental models/features (#9341)projects
(#9377)pixel_decoder_type
discrimination in MaskFormer Head. (#9176)utils/typing.py
to utils/typing_utils.py
to fix collect_env
error (#9265)inference_detector
(#9144)counts
in COCO’s compressed RLE (#9274)print_config
(#9276)A total of 13 developers contributed to this release.
Thanks @JunyaoHu, @sanbuphy, @Czm369, @Daa98, @jbwang1997, @BIGWangYuDong, @JosonChan1998, @lvhan028, @RunningLeon, @RangiLyu, @Daa98, @ZwwWayne, @hhaAndroid
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc3...v3.0.0rc4
Published by ZwwWayne almost 2 years ago
A total of 11 developers contributed to this release.
Thanks @wangjiangben-hw, @motokimura, @AdorableJiang, @BainOuO, @JarvisKevin, @wanghonglie, @zytx121, @BIGWangYuDong, @hhaAndroid, @RangiLyu, @ZwwWayne
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.25.3...v2.26.0
Published by ZwwWayne almost 2 years ago
ignore_key
of ConcatDataset
for training VOC datasets (#9058)XMLDataset
image size error (#9216)ignore_key
in VOC ConcatDataset
(#9058)box_type
support for DynamicSoftLabelAssigner
(#9179)seg_map_suffix
in BaseDetDataset
(#9088)A total of 13 developers contributed to this release.
Thanks @wanghonglie, @Wwupup, @sanbuphy, @BIGWangYuDong, @liuyanyi, @cxiang26, @jbwang1997, @ZwwWayne, @yuyoujiang, @RangiLyu, @hhaAndroid, @JosonChan1998, @Czm369
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc2...v3.0.0rc3
Published by RangiLyu almost 2 years ago
A total of 13 developers contributed to this release.
Thanks @Zheng-LinXiao, @i-aki-y, @fbagci, @sudoAimer, @Czm369, @DrRyanHuang, @RangiLyu, @wanghonglie, @shinya7y, @Ryoo72, @akshaygulabrao, @gy-7, @Neesky
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.25.2...v2.25.3
Published by ZwwWayne almost 2 years ago
CrowdHumanDataset
and Metric (#8430)FixShapeResize
to support resize of fixed shape (#8665)ConcatDataset
Import Error (#8909)CircleCI
and readthedoc
build failed (#8980, #8963)out_shape
is different (#8993)Conv2d
weight channels (#8948)albumentations
(#9074)RTMDet
in metafile (#9098)OpenImageMetrics
in the config (#9061)box type
(#8658)BitmapMasks
and PolygonMasks
(#9006)robustness_eval.py
and print_config
(#8452)ConfigDict
and dict
in dense_heads
(#8942)Normalize
transform (#8913)PackDetInputs
(#8982)A total of 13 developers contributed to this release.
Thanks @RangiLyu, @jbwang1997, @wanghonglie, @Chan-Sun, @RangeKing, @chhluo, @MambaWong, @yuyoujiang, @hhaAndroid, @sltlls, @Nioolek, @ZwwWayne, @wufan-tb
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc1...v3.0.0rc2
Published by ZwwWayne about 2 years ago
NumClassCheckHook
bug when model is wrapped (#8794)FSAF
and RepPoints
Head (#8813)box type
(#8625)SemiBaseDetector
and SoftTeacher
(#8786)analyze_results.py
, analyze_logs.py
and loading.py
(#8430, #8402, #8784)test.py
(#8814)DetLocalVisualizer._draw_instances
(#8830)floordiv
warning in SOLO
(#8738)A total of 16 developers contributed to this release.
Thanks @ZwwWayne, @jbwang1997, @Czm369, @ice-tong, @Zheng-LinXiao, @chhluo, @RangiLyu, @liuyanyi, @wanghonglie, @levan92, @JiayuXu0, @nye0, @hhaAndroid, @xin-li-67, @shuxp, @zytx121
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v3.0.0rc0...v3.0.0rc1
Published by ZwwWayne about 2 years ago
BboxOverlaps2D
(#8512)A total of 16 developers contributed to this release.
Thanks @daquexian, @lyq10085, @ZwwWayne, @fbagci, @BubblyYi, @fathomson, @ShunchiZhang, @ceasona, @Happylkx, @normster, @chhluo, @Lehsuby, @JiayuXu0, @Nourollah, @hewanru-bit, @RangiLyu
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.25.1...v2.25.2
Published by ZwwWayne about 2 years ago
We are excited to announce the release of MMDetection 3.0.0rc0. MMDet 3.0.0rc0 is the first version of MMDetection 3.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine, MMDet 3.x unifies the interfaces of the dataset, models, evaluation, and visualization with faster training and testing speed. It also provides a general semi-supervised object detection framework and strong baselines.
New engine. MMDet 3.x is based on MMEngine, which provides a universal and powerful runner that allows more flexible customizations and significantly simplifies the entry points of high-level interfaces.
Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMDet 3.x unifies and refactors the interfaces and internal logic of training, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logic to allow the emergence of multi-task/modality algorithms.
Faster speed. We optimize the training and inference speed for common models and configurations, achieving faster or similar speed in comparison with Detection2. Model details of benchmark will be updated in this note.
General semi-supervised object detection. Benefitting from the unified interfaces, we support a general semi-supervised learning framework that works with all the object detectors supported in MMDet 3.x. Please refer to semi-supervised object detection for details.
Strong baselines. We release strong baselines of many popular models to enable fair comparisons among state-of-the-art models.
New features and algorithms:
More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.
MMDet 3.x has gone through big changes to have better design, higher efficiency, more flexibility, and more unified interfaces.
Besides the changes in API, we briefly list the major breaking changes in this section.
We will update the migration guide to provide complete details and migration instructions.
Users can also refer to the API doc for more details.
mmcv
is the version that provides pre-built CUDA operators and mmcv-lite
does not since MMCV 2.0.0rc0, while mmcv-full
has been deprecated since 2.0.0rc0.mmdet.train.apis
and tools/train.py
. Those codes have been migrated into MMEngine. Please refer to the migration guide of Runner in MMEngine for more details.The Dataset classes implemented in MMDet 3.x all inherit from the BaseDetDataset
, which inherits from the BaseDataset in MMEngine. In addition to the changes in interfaces, there are several changes in Dataset in MMDet 3.x.
The data transforms in MMDet 3.x all inherits from BaseTransform
in MMCV>=2.0.0rc0, which defines a new convention in OpenMMLab 2.0 projects.
Besides the interface changes, there are several changes listed below:
Resize
) are decomposed into several transforms to simplify and clarify the usages.Resize
in MMDet 3.x and MMSeg 1.x will resize the image in the exact same manner given the same arguments.The models in MMDet 3.x all inherit from BaseModel
in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects.
Users can refer to the tutorial of the model in MMengine for more details.
Accordingly, there are several changes as the following:
inputs
and data_samples
, where inputs
contains model inputs like a list of image tensors, and data_samples
contains other information of the current data sample such as ground truths, region proposals, and model predictions. In this way, different tasks in MMDet 3.x can share the same input arguments, which makes the models more general and suitable for multi-task learning and some flexible training paradigms like semi-supervised learning.forward_train
, forward_test
, simple_test
, and aug_test
to deal with different model forward logics. In MMDet 3.x and OpenMMLab 2.0, the forward function has three modes: 'loss', 'predict', and 'tensor' for training, inference, and tracing or other purposes, respectively.self.loss
, self.predict
, and self._forward
given the modes 'loss', 'predict', and 'tensor', respectively.The evaluation in MMDet 2.x strictly binds with the dataset. In contrast, MMDet 3.x decomposes the evaluation from the dataset so that all the detection datasets can evaluate with COCO AP and other metrics implemented in MMDet 3.x.
MMDet 3.x mainly implements corresponding metrics for each dataset, which are manipulated by Evaluator to complete the evaluation.
Users can build an evaluator in MMDet 3.x to conduct offline evaluation, i.e., evaluate predictions that may not produce in MMDet 3.x with the dataset as long as the dataset and the prediction follow the dataset conventions. More details can be found in the tutorial in mmengine.
The functions of visualization in MMDet 2.x are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMDet 3.x implements DetLocalVisualizer
to allow visualization of ground truths, model predictions, feature maps, etc., at any place. It also supports sending the visualization data to any external visualization backends such as Tensorboard.
HorizontalBoxes
and BaseBoxes
to encapsulate different kinds of bounding boxes. We are migrating to use data structures of boxes to replace the use of pure tensor boxes. This will unify the usages of different kinds of bounding boxes in MMDet 3.x and MMRotate 1.x to simplify the implementation and reduce redundant codes.We list several planned changes of MMDet 3.0.0rc0 so that the community could more comprehensively know the progress of MMDet 3.x. Feel free to create a PR, issue, or discussion if you are interested, have any suggestions and feedback, or want to participate.
tools
directory will have their python interfaces so that they can be used in Jupyter Notebook, Colab, and downstream libraries.DetWandbVisualizer
and maybe a hook are planned to fully migrate those functionalities from MMDet 2.x.A total of 11 developers contributed to this release.
Thanks @shuxp, @wanghonglie, @Czm369, @BIGWangYuDong, @zytx121, @jbwang1997, @chhluo, @jshilong, @RangiLyu, @hhaAndroid, @ZwwWayne
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.25.0...v3.0.0rc0
Published by ZwwWayne about 2 years ago
WandbLoggerHook
error (#8273)mim
to extras_require
in setup.py (#8194)mim
in CI (#8230 & #8240)maskformer
to be compatible when cfg is a dictionary (#8263)Pillow
version check in CI (#8229)A total of 15 developers contributed to this release.
Thanks @ZwwWayne, @ayulockin, @Mxbonn, @p-mishra1, @Youth-Got, @MiXaiLL76, @chhluo, @jbwang1997, @atinfinity, @shinya7y, @duanzhihua, @STLAND-admin, @BIGWangYuDong, @grimoire, @xiaoyuan0203
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.25.0...v2.25.1
Published by ZwwWayne over 2 years ago
WandbLogger
hookRename config files of Mask2Former (#7571)
mask2former_xxx_coco.py
represents config files for panoptic segmentation.mask2former_xxx_coco.py
represents config files for instance segmentation.mask2former_xxx_coco-panoptic.py
represents config files for panoptic segmentation.interval != 1
(#7784)Support dedicated WandbLogger
hook (#7459)
Users can set
cfg.log_config.hooks = [
dict(type='MMDetWandbHook',
init_kwargs={'project': 'MMDetection-tutorial'},
interval=10,
log_checkpoint=True,
log_checkpoint_metadata=True,
num_eval_images=10)]
in the config to use MMDetWandbHook
. Example can be found in this colab tutorial
Add AvoidOOM
to avoid OOM (#7434, #8091)
Try to use AvoidCUDAOOM
to avoid GPU out of memory. It will first retry after calling torch.cuda.empty_cache()
. If it still fails, it will then retry by converting the type of inputs to FP16 format. If it still fails, it will try to copy inputs from GPUs to CPUs to continue computing. Try AvoidOOM in code to make the code continue to run when GPU memory runs out:
from mmdet.utils import AvoidCUDAOOM
output = AvoidCUDAOOM.retry_if_cuda_oom(some_function)(input1, input2)
Users can also try AvoidCUDAOOM
as a decorator to make the code continue to run when GPU memory runs out:
from mmdet.utils import AvoidCUDAOOM
@AvoidCUDAOOM.retry_if_cuda_oom
def function(*args, **kwargs):
...
return xxx
Support reading gpu_collect
from cfg.evaluation.gpu_collect
(#7672)
Speedup the Video Inference by Accelerating data-loading Stage (#7832)
Support replacing the ${key}
with the value of cfg.key
(#7492)
Accelerate result analysis in analyze_result.py
. The evaluation time is speedup by 10 ~ 15 times and only tasks 10 ~ 15 minutes now. (#7891)
Support to set block_dilations
in DilatedEncoder
(#7812)
Support panoptic segmentation result analysis (#7922)
Release DyHead with Swin-Large backbone (#7733)
Documentations updating and adding
act_cfg
in SwinTransformer
(#7794)markdownlint
with mdformat
for avoiding installing ruby (#8009)A total of 20 developers contributed to this release.
Thanks @ZwwWayne, @DarthThomas, @solyaH, @LutingWang, @chenxinfeng4, @Czm369, @Chenastron, @chhluo, @austinmw, @Shanyaliux @hellock, @Y-M-Y, @jbwang1997, @hhaAndroid, @Irvingao, @zhanggefan, @BIGWangYuDong, @Keiku, @PeterVennerstrom, @ayulockin
Full Changelog: https://github.com/open-mmlab/mmdetection/compare/v2.24.1...v2.25.0