Get hundred of million of image+url from the crawling at home dataset and preprocess them
Statistics for this project are still being loaded, please check back later.
Stable Diffusion web UI
Dromedary: towards helpful, ethical and reliable LLMs.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M u...
Contrastive Language-Audio Pretraining
Recaption large (Web)Datasets with vllm and save the artifacts.
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/tex...
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
Official repo for VGen: a holistic video generation ecosystem for video generation building on di...
Deep Learning model Zoo
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Non-local Neural Networks for Video Classification