Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
MIT License
MegaBlocks
Diffusers training with mmengine
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...
Mixture-of-Experts for Large Vision-Language Models
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
A fast MoE impl for PyTorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen...
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 ...
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the param...
minichatgpt - To Train ChatGPT In 5 Minutes
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or other...
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...