COYO-700M: Large-scale Image-Text Pair Dataset
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
[CVPR'23, Highlight] ECON: Explicit Clothed humans Optimized via Normal integration
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
🔥🔥🔥🔥 (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with Ten...
Flickr-Faces-HQ Dataset (FFHQ)
Kolors Team
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
[ICML'23] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
The RedPajama-Data repository contains code for preparing large datasets for training large langu...
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Docu...
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch ...
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"