Open Source Ecosystems

Sinkhorn Router - Pytorch (wip)

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise. Will contain both a causal and non-causal variant. The causal variant will follow the example used in Megatron

Install

$ pip install sinkhorn-router-pytorch

Usage

import torch
from torch import nn
from sinkhorn_router_pytorch import SinkhornRouter

experts = nn.Parameter(torch.randn(8, 8, 512, 256)) # (experts, heads, dim [in], dim [out])

router = SinkhornRouter(
    dim = 512,
    experts = experts,
    competitive = True,
    causal = False,
)

x = torch.randn(1, 8, 1017, 512)
out = router(x) # (1, 8, 1017, 256)

Citations

@article{Shoeybi2019MegatronLMTM,
    title   = {Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism},
    author  = {Mohammad Shoeybi and Mostofa Patwary and Raul Puri and Patrick LeGresley and Jared Casper and Bryan Catanzaro},
    journal = {ArXiv},
    year    = {2019},
    volume  = {abs/1909.08053},
    url     = {https://api.semanticscholar.org/CorpusID:202660670}
}

@article{Anthony2024BlackMambaMO,
    title   = {BlackMamba: Mixture of Experts for State-Space Models},
    author  = {Quentin Anthony and Yury Tokpanov and Paolo Glorioso and Beren Millidge},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2402.01771},
    url     = {https://api.semanticscholar.org/CorpusID:267413070}
}

@article{Csordas2023SwitchHeadAT,
    title   = {SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention},
    author  = {R'obert Csord'as and Piotr Piekos and Kazuki Irie and J{\"u}rgen Schmidhuber},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.07987},
    url     = {https://api.semanticscholar.org/CorpusID:266191825}
}

Package Rankings

Top 34.58% on Pypi.org

Related Projects

g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

18 May 2021 422

graph-transformer-pytorch

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

18 Jun 2021 197

SAC-pytorch

Implementation of Soft Actor Critic and some of its improvements in Pytorch

12 Feb 2024 34

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...

15 May 2023 620

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...

16 Jun 2020 1,687

soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

04 Aug 2023 239

mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mi...

21 Apr 2023 101

st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

26 Mar 2023 285

mirasol-pytorch

Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

18 Nov 2023 84

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

20 Mar 2023 223

gamengen-pytorch

Implementation of a framework for Gamengen in Pytorch

28 Aug 2024 89

mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the param...

13 Jul 2020 624

the-incredible-pytorch

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...

11 Feb 2017 11,389

bottleneck-transformer-pytorch

Implementation of Bottleneck Transformer in Pytorch

28 Jan 2021 670

PEER-pytorch

Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen...

09 Jul 2024 109