[CVPR 2024] Real-Time Open-Vocabulary Object Detection
GPL-3.0 License
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Implementation of paper - Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneo...
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large...
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
☁️💡🎈专注于改进YOLOv7,Support to improve Backbone, Neck, Head, Loss, IoU, NMS and other modules
Minimal PyTorch implementation of YOLOv3
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
yolov5 + csl_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)基于yolov5的旋转目标检测
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
🔥🔥🔥🔥 (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with Ten...