Real-time and accurate open-vocabulary end-to-end object detection
APACHE-2.0 License
If you are interested in our research, we welcome you to explore our other wonderful projects.
🔆 How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection(AAAI24) 🏠Github Repository
🔆 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network(IET Computer Vision)
This repository is the official PyTorch implementation for OmDet-Turbo, a fast transformer-based open-vocabulary object detection model.
⭐️Highlights
For more details, check out our paper Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
Comparison of inference speeds for each component in tiny-size model.
Follow the Installation Instructions to set up the environments for OmDet-Turbo
We already added language cache while inferring with run_demo.py. For more details, please open and check run_demo.py scripts.
In the above example, post processing is not included in onnx model , and all input size are fixed. You can add more post processing and change the input size according to your needs.
The performance of COCO and LVIS are evaluated under zero-shot setting.
Model | Backbone | Pre-Train Data | COCO | LVIS | FPS (pytorch/trt) | Weight |
---|---|---|---|---|---|---|
OmDet-Turbo-Tiny | Swin-T | O365,GoldG | 42.5 | 30.3 | 21.5/140.0 | weight |
Please consider citing our papers if you use our projects:
@article{zhao2024real,
title={Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head},
author={Zhao, Tiancheng and Liu, Peng and He, Xuan and Zhang, Lu and Lee, Kyusong},
journal={arXiv preprint arXiv:2403.06892},
year={2024}
}
@article{zhao2024omdet,
title={OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network},
author={Zhao, Tiancheng and Liu, Peng and Lee, Kyusong},
journal={IET Computer Vision},
year={2024},
publisher={Wiley Online Library}
}