Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
MIT License
Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...
Implementation of Feedback Transformer in Pytorch
Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
Implementation of Linformer for Pytorch
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Re...
An implementation of local windowed attention for language modeling
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
Implementation of the Point Transformer layer, in Pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
Pytorch reimplementation of Molecule Attention Transformer, which uses a transformer to tackle th...
Implementation of Fast Transformer in Pytorch
Implementation of Long-Short Transformer, combining local and global inductive biases for attenti...
Implementation of Block Recurrent Transformer - Pytorch