favor

Simple PyTorch implementation of the FAVOR attention layer from the Performer

Presentation

This repository implements an implementation for PyTorch of the FAVOR attention layer from the paper

Choromanski, Krzysztof, et al. "Rethinking attention with performers." arXiv preprint arXiv:2009.14794 (2020).

The class accepts the following parameters:

class FAVOR(nn.Module):
    """Fast Attention Via positive Orthogonal Random features"""
    def __init__(
        self,
        key_dim, # dimension of the keys
        orthonormal=True, # whether or not the random features are drawn orthonormal
        causal=False, # whether or not to use causal ("unidirectional") attention
        m=128, # the number of random features to compute the attention
        redraw=True, # whether the features should be drawn anew each time
        h=None, # feature coefficient (default: sqrt(m))
        f=[F.relu,], # function(s) applied to projections (see paper)
        randomizer=torch.randn, # the randomizer for the features. default=gaussian
        eps=0.0, # numerical stabilizer for renormalization
        kernel_eps=0.001, # numerical stabilizer added after applying the kernel function
    )

The default behaviour is with the ReLU features, since they apparently perform best in the paper.

The forward function then comes as follows:

def forward(self, keys, values, queries):
        """
        keys: (batch, keys_dimension, *keys_locations)
        values: (batch, values_dimension, *keys_locations)
        queries: (batch, keys_dimension, *queries_locations)
        """

For causal attention, keys_locations and queries_locations must be equal.

Installation

Type pip install -e . in the root folder of this repo.

and then

 from favor import FAVOR

Related Projects

AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Vi...

07 Nov 2020 40

halonet-pytorch

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Paramete...

24 Mar 2021 200

nystrom-attention

Implementation of Nyström Self-attention, from the paper Nyströmformer

11 Feb 2021 120

attention-cnn

Source code for "On the Relationship between Self-Attention and Convolutional Layers"

25 Jun 2019 1,077

bidirectional-cross-attention

A simple cross attention that updates both the source and target in one step

27 Mar 2022 143

point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

18 Dec 2020 587

flash-attention

Fast and memory-efficient exact attention

19 May 2022 11,791

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

03 Oct 2020 1,084

msa-transformer

MSA Transformer reproduction code

20 Feb 2021 48

attention-is-all-you-need

Pytorch implementation of the reknowned "Attention Is All You Need" paper - NeurIPS 2017

21 Jan 2024 3

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...

03 Mar 2022 355

flash-attention-jax

Implementation of Flash Attention in Jax

12 Jul 2022 189

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

29 Sep 2022 1,852

memory-compressed-attention

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizin...

25 Jul 2020 71

slot-attention

Implementation of Slot Attention from GoogleAI

29 Jun 2020 380