Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
MIT License
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the param...
Implementation of the Remixer Block from the Remixer paper, in Pytorch
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or other...
Implementation of Feedback Transformer in Pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...
Adversarially Learned Inference in Pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
Implementation of Bottleneck Transformer in Pytorch
Implementation of Fast Transformer in Pytorch
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Implementation of a framework for Gamengen in Pytorch