Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
MIT License
Published by lucidrains over 1 year ago
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...
Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...
Implementation of an Attention layer where each head can attend to more than just one token, usin...
Implementation of Agent Attention in Pytorch
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Fast and memory-efficient exact attention
An implementation of Performer, a linear attention-based transformer, in Pytorch
Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-E...
An implementation of local windowed attention for language modeling