coordinate-descent-attention

Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk

MIT License

Stars

Committers

View Code on GitHub

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

0.0

24.0

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

Related Projects

agent-attention-pytorch

Implementation of Agent Attention in Pytorch

18 Dec 2023 85

quartic-transformer

Exploring an idea where one forgets about efficiency and carries out attention across each edge o...

03 Feb 2024 43

deformable-attention

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deforma...

17 Mar 2022 275

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

29 Sep 2022 1,852

local-attention

An implementation of local windowed attention for language modeling

05 Jul 2020 375

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

20 Mar 2023 223

mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mi...

21 Apr 2023 101

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...

03 Mar 2022 356

equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...

29 Oct 2022 242

memory-transformer-xl

A variant of Transformer-XL where the memory is updated not with a queue, but with attention

10 Jul 2020 45

q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...

20 Sep 2023 338

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

14 Feb 2024 457

h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

28 Jul 2021 153

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

03 Oct 2020 1,084

block-recurrent-transformer-pytorch

Implementation of Block Recurrent Transformer - Pytorch

07 Feb 2023 212