Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
MIT License
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Implementation of a framework for Gamengen in Pytorch
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the param...
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen...
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
Implementation of Soft Actor Critic and some of its improvements in Pytorch
Implementation of Bottleneck Transformer in Pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch