A high-level programming language for using computer vision.
MIT License
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Scenic: A Jax Library for Computer Vision Research and Beyond
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Syllabus for Scrapism @ SFPC / Fall 2022
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Using LLMs and pre-trained caption models for super-human performance on image captioning.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabu...
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...