Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
MIT License
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
A flexible package for multimodal-deep-learning to combine tabular data with text and images usin...
MegaBlocks
A fast MoE impl for PyTorch
Implementation of Slot Attention from GoogleAI
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or other...
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen...
Implementation of Feedback Transformer in Pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the param...