mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

MIT License

Downloads

1.1K

Stars

101

Committers

View Code on GitHub

Ecosystems: Python

mixture-of-attention - 0.0.3

Published by lucidrains over 1 year ago

mixture-of-attention - 0.0.2

Published by lucidrains over 1 year ago

mixture-of-attention - 0.0.1a

Published by lucidrains over 1 year ago

mixture-of-attention - 0.0.1

Published by lucidrains over 1 year ago

Package Rankings

Top 22.15% on Pypi.org

Related Projects

meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

29 Nov 2023 642

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

20 Mar 2023 223

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

14 Feb 2024 457

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

29 Sep 2022 1,852

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...

03 Mar 2022 356

equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...

29 Oct 2022 242

coordinate-descent-attention

Implementation of an Attention layer where each head can attend to more than just one token, usin...

31 Mar 2023 46

agent-attention-pytorch

Implementation of Agent Attention in Pytorch

18 Dec 2023 85

simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

06 Apr 2023 204

taylor-series-linear-attention

Explorations into the recently proposed Taylor Series Linear Attention

23 Dec 2023 88

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

flash-attention

Fast and memory-efficient exact attention

19 May 2022 11,791

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

03 Oct 2020 1,084

En-transformer

Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-E...

27 Feb 2021 208

local-attention

An implementation of local windowed attention for language modeling

05 Jul 2020 375