Mega-pytorch

Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena

MIT License

Downloads

453

Stars

203

Committers

View Code on GitHub

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

0.0

26.0

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

1 day

Package Rankings

Top 13.6% on Pypi.org

Related Projects

g-mlp-gpt

GPT, but made only out of MLPs

20 May 2021 86

mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mi...

21 Apr 2023 101

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...

15 May 2023 620

omninet-pytorch

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

02 Mar 2021 53

point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

18 Dec 2020 587

dilated-attention-pytorch

(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000...

09 Jul 2023 47

bidirectional-cross-attention

A simple cross attention that updates both the source and target in one step

27 Mar 2022 145

perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver ...

18 Jun 2022 86

g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

18 May 2021 422

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...

03 Mar 2022 356

agent-attention-pytorch

Implementation of Agent Attention in Pytorch

18 Dec 2023 85

msa-transformer

MSA Transformer reproduction code

20 Feb 2021 48

CALM-pytorch

Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composi...

09 Jan 2024 167

perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

05 Mar 2021 1,049

FLASH-pytorch

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

28 Mar 2022 345