mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

MIT License

Downloads

1.1K

Stars

101

Committers

View Code on GitHub

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

0.0

41.0

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

about 2 hours

Package Rankings

Top 22.15% on Pypi.org

Related Projects

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

local-attention

An implementation of local windowed attention for language modeling

05 Jul 2020 375

agent-attention-pytorch

Implementation of Agent Attention in Pytorch

18 Dec 2023 85

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

20 Mar 2023 223

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

03 Oct 2020 1,084

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

14 Feb 2024 457

En-transformer

Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-E...

27 Feb 2021 208

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

29 Sep 2022 1,852

equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...

29 Oct 2022 242

simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

06 Apr 2023 204

taylor-series-linear-attention

Explorations into the recently proposed Taylor Series Linear Attention

23 Dec 2023 88

meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

29 Nov 2023 642

flash-attention

Fast and memory-efficient exact attention

19 May 2022 11,791

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...

11 Oct 2023 429

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...

03 Mar 2022 356