all-normalization-transformer

A simple Transformer where the softmax has been replaced with normalization

MIT License

Stars

17

Committers

View Code on GitHub

Ecosystems: Python

Transformer with Normalized Attention

A Transformer that consists of only normalization as its sole non-linearity, as proposed in the paper Normalized Attention Without Probability Cage. This repository will build on the paper's contributions and attempt to make it work for the auto-regressive case.

Update - It works. You can have an entire language model built on only matrix multiplies and normalization.

Pre-requisites

$ pip install -r requirements.txt

Train

$ python train_enwik8.py

Citations

@misc{richter2020normalized,
    title={Normalized Attention Without Probability Cage},
    author={Oliver Richter and Roger Wattenhofer},
    year={2020},
    eprint={2005.09561},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Related Projects

agent-attention-pytorch

Implementation of Agent Attention in Pytorch

rela-transformer

Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/ab...

point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

18 Dec 2020 587

VN-transformer

A Transformer made of Rotation-equivariant Attention using Vector Neurons

local-attention

An implementation of local windowed attention for language modeling

05 Jul 2020 375

tab-transformer-pytorch

Implementation of TabTransformer, attention network for tabular data, in Pytorch

15 Dec 2020 793

RQ-Transformer

Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Re...

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

03 Oct 2020 1,084

h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

28 Jul 2021 153

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

22 Jun 2022 522

transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...

02 Mar 2021 300

complex-valued-transformer

Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...

triton-transformer

Implementation of a Transformer, but completely in Triton

08 Sep 2021 243

lie-transformer-pytorch

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

fast-transformer-pytorch

Implementation of Fast Transformer in Pytorch

23 Aug 2021 171