Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
MIT License
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks ...
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Genera...
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling wit...
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch
Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 min...
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Modeling, training, eval, and inference code for OLMo