My implementation of Cross Modal Retrieval models from CVPR'18 and ECCV'18
No README available, please check again later.
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image capt...
SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and ...
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
[ECCV2018] Distractor-aware Siamese Networks for Visual Object Tracking
An MXNet implementation of Mask R-CNN
Code for EMNLP 2023 industry track paper "Learning Multilingual Sentence Representations with Cro...
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
This repository contains the official implementation to reproduce object detection results of ViP.
This is the official implemantation of “Learn-to-Decompose: Cascaded Decomposition Network for Cr...
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable...
[ICCV2021] Code Release of Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
PyTorch codes for "Real-World Blind Super-Resolution via Feature Matching with Implicit High-Reso...