My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
MIT License
This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation)...
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Re...
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research ...
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Re...
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image gen...
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks,...
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch