Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction
MIT License
Implementation of an Attention layer where each head can attend to more than just one token, usin...
MSA Transformer reproduction code
Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and...
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and ...
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Implementation of Agent Attention in Pytorch
Implementation of a Transformer, but completely in Triton
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
An implementation of local windowed attention for language modeling
Fast and memory-efficient exact attention
Implementation of Block Recurrent Transformer - Pytorch
Implementation of ProteinBERT in Pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...
Implementation of MagViT2 Tokenizer in Pytorch