Here we will test various linear attention designs.
APACHE-2.0 License
Statistics for this project are still being loaded, please check back later.
Implementation of Kronecker Attention in Pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000...
Minimalistic large language model 3D-parallelism training
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
MSA Transformer reproduction code
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
Lightning implementation of Meta Pseudo Label
Implementation of the Point Transformer layer, in Pytorch
An essential implementation of BYOL in PyTorch + PyTorch Lightning