My implementation of Cross Modal Retrieval models from CVPR'18 and ECCV'18
Statistics for this project are still being loaded, please check back later.
This is the official implemantation of “Learn-to-Decompose: Cascaded Decomposition Network for Cr...
Code for EMNLP 2023 industry track paper "Learning Multilingual Sentence Representations with Cro...
[ECCV2018] Distractor-aware Siamese Networks for Visual Object Tracking
PyTorch codes for "Real-World Blind Super-Resolution via Feature Matching with Implicit High-Reso...
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and ...
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image capt...
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
An MXNet implementation of Mask R-CNN
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable...
[ICCV2021] Code Release of Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
This repository contains the official implementation to reproduce object detection results of ViP.