The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
APACHE-2.0 License
Statistics for this project are still being loaded, please check back later.
Meta-Transformer for Unified Multimodal Learning
Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large...
[ICCV2023 oral] Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstru...
[CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
This is a collection of our NAS and Vision Transformer work.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image capt...
Official implementation of PVT series
The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture S...
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable...
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".