Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
MIT License
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
An implementation of local windowed attention for language modeling
Implementation of Agent Attention in Pytorch
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-E...
Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch
Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Fast and memory-efficient exact attention
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...