CUDA implementation of autoregressive linear attention, with all the latest research findings
MIT License
Implementation of Agent Attention in Pytorch
Implementation of Block Recurrent Transformer - Pytorch
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
A Transformer made of Rotation-equivariant Attention using Vector Neurons
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deforma...
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...
An implementation of local windowed attention for language modeling
An implementation of Performer, a linear attention-based transformer, in Pytorch