Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
MIT License
Transformer: PyTorch Implementation of "Attention Is All You Need"
Fully featured implementation of Routing Transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
Reformer, the efficient Transformer, in Pytorch