Transformers with Arbitrarily Large Context
APACHE-2.0 License
Statistics for this project are still being loaded, please check back later.
Fast and memory-efficient exact attention
Implementation of Agent Attention in Pytorch
Graph neural network message passing reframed as a Transformer with local attention
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
An implementation of local windowed attention for language modeling
An implementation of Performer, a linear attention-based transformer, in Pytorch
Memory optimization and training recipes to extrapolate language models' context length to 1 mill...
Implementation of Block Recurrent Transformer - Pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...
Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...
Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-E...
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000...
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
Implementation of GateLoop Transformer in Pytorch and Jax