A simple Transformer where the softmax has been replaced with normalization
MIT License
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of a Transformer, but completely in Triton
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Re...
A Transformer made of Rotation-equivariant Attention using Vector Neurons
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/ab...
Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...
Implementation of Fast Transformer in Pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
An implementation of local windowed attention for language modeling
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Implementation of Agent Attention in Pytorch
Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch
Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...
Implementation of TabTransformer, attention network for tabular data, in Pytorch
Implementation of the Point Transformer layer, in Pytorch