Implementation of GateLoop Transformer in Pytorch and Jax
MIT License
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
An implementation of Performer, a linear attention-based transformer, in Pytorch
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Implementation of Feedback Transformer in Pytorch
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of an Attention layer where each head can attend to more than just one token, usin...
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and ...
Explorations into the recently proposed Taylor Series Linear Attention
An implementation of local windowed attention for language modeling
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
Implementation of Block Recurrent Transformer - Pytorch
Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...
Modular Python implementation of encoder-only, decoder-only and encoder-decoder transformer archi...
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating poin...