OmDet-Turbo

🗓️ Updates

07/05/2024: Our new open-source project, OmAget: A multimodal agent framework for solving complex tasks is available !!! Additionally, OmDet has been seamlessly integrated as an OVD tool within it. Feel free to delve into our innovative multimodal agent framework.
06/24/2024: Guidance for converting OmDet-Turbo to ONNX
03/25/2024: Inference code and a pretrained OmDet-Turbo-Tiny model released.
03/12/2024: Github open-source project created

🔗 Related Works

If you are interested in our research, we welcome you to explore our other wonderful projects.

🔆 How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection(AAAI24) 🏠Github Repository

🔆 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network(IET Computer Vision)

📖 Introduction

This repository is the official PyTorch implementation for OmDet-Turbo, a fast transformer-based open-vocabulary object detection model.

⭐️Highlights

OmDet-Turbo is a transformer-based real-time open-vocabulary
detector that combines strong OVD capabilities with fast inference speed.
This model addresses the challenges of efficient detection in open-vocabulary
scenarios while maintaining high detection performance.
We introduce the Efficient Fusion Head, a swift multimodal fusion module
designed to alleviate the computational burden on the encoder and reduce
the time consumption of the head with ROI.
OmDet-Turbo-Base model, achieves state-of-the-art zero-shot performance on the ODinW and OVDEval datasets, with AP scores
of 30.1 and 26.86, respectively.
The inference speed of OmDetTurbo-Base on the COCO val2017 dataset reach 100.2 FPS on an A100 GPU.

For more details, check out our paper Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

⚡️ Inference Speed

Comparison of inference speeds for each component in tiny-size model.

🛠️ How To Install

Follow the Installation Instructions to set up the environments for OmDet-Turbo

🚀 How To Run

Local Inference

Download our pretrained model and the CLIP checkpoints.
Create a folder named resources, put downloaded models into this folder.
Run run_demo.py, the images with predicted results will be saved at ./outputs folder.

Run as a API Server

Download our pretrained model and the CLIP checkpoints.
Create a folder named resources, put downloaded models into this folder.
Run run_wsgi.py, the API server will be started at http://host_ip:8000/inf_predict, check http://host_ip:8000/docs to have a try.

We already added language cache while inferring with run_demo.py. For more details, please open and check run_demo.py scripts.

⚙️ How To Export ONNX Model

Replace OmDetV2Turbo in OmDet-Turbo_tiny_SWIN_T.yaml with OmDetV2TurboInfer
Run export.py, and the omdet.onnx will be exported.

In the above example, post processing is not included in onnx model , and all input size are fixed. You can add more post processing and change the input size according to your needs.

📦 Model Zoo

The performance of COCO and LVIS are evaluated under zero-shot setting.

Model	Backbone	Pre-Train Data	COCO	LVIS	FPS (pytorch/trt)	Weight
OmDet-Turbo-Tiny	Swin-T	O365,GoldG	42.5	30.3	21.5/140.0	weight

📝 Main Results

Citation

Please consider citing our papers if you use our projects:

@article{zhao2024real,
  title={Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head},
  author={Zhao, Tiancheng and Liu, Peng and He, Xuan and Zhang, Lu and Lee, Kyusong},
  journal={arXiv preprint arXiv:2403.06892},
  year={2024}
}

@article{zhao2024omdet,
  title={OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network},
  author={Zhao, Tiancheng and Liu, Peng and Lee, Kyusong},
  journal={IET Computer Vision},
  year={2024},
  publisher={Wiley Online Library}
}

Related Projects

mmdetection_test

12 May 2021 4

Object-Detection-and-Tracking

Object Detection and Multi-Object Tracking

18 Apr 2019 1,838

pcdet-note

OpenPCDet 代码重点注解笔记

04 Sep 2022 58

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

08 Jul 2023 1,506

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

29 Jan 2024 4,434

ImageAI

A python library built to empower developers to build applications and systems with self-contain...

19 Mar 2018 8,422

DPT

Dense Prediction Transformers

22 Mar 2021 1,967

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

05 Sep 2019 30,098

Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

11 Dec 2021 1,863

Towards-Realtime-MOT

Joint Detection and Embedding for fast multi-object tracking

27 Sep 2019 2,383

DB

A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".

18 Nov 2019 2,084

ovdet

[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection

26 Feb 2023 125

RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat...

10 May 2023 1,237

GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

02 Sep 2024 5,166

OWOD

(CVPR 2021 Oral) Open World Object Detection

29 Sep 2020 1,033