Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
MIT License
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
DALL·E Mini - Generate images from a text prompt
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 min...
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch
A concise but complete implementation of CLIP with various experimental improvements from recent ...
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net o...
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks ...
Implementation of MagViT2 Tokenizer in Pytorch
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling wit...
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch