Embed arbitrary modalities (images, audio, documents, etc) into large language models.
APACHE-2.0 License
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
An open platform for training, serving, and evaluating large language models. Release repo for Vi...
Home of StarCoder: fine-tuning & inference!
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Chain-of-Hindsight, A Scalable RLHF Method
The PyTorch Implementation based on YOLOv4 of the paper: "Complex-YOLO: Real-time 3D Object Detec...
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Pythonic AI generation of images and videos
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.