Another attempt at a long-context / efficient transformer by me
MIT License
Implementation of Infini-Transformer in Pytorch
Implementations and explorations into the ReST𝐸𝑀 algorithm in the new deepmind paper "Beyond Huma...
Implementation of Long-Short Transformer, combining local and global inductive biases for attenti...
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow...
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
A simple Transformer where the softmax has been replaced with normalization
Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...
Implementation of Fast Transformer in Pytorch
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of the Point Transformer layer, in Pytorch
Implementation of GateLoop Transformer in Pytorch and Jax
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Re...