tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

MIT License

Stars
716
Committers
15

Bot releases are hidden (Show)

tutel - Tutel v0.3.2 Latest Release

Published by ghostplant 5 months ago

What's New in v0.3.2:

  1. Add tutel.net.all_to_all_v & tutel.net.all_gather_v for dispatching messages of dynamic sizes.
  2. Add --use_tensorcore option for benchmarking in tutel.examples.helloworld.
  3. Read TUTEL_GLOBAL_TIMEOUT_SEC from environment variable to configure NCCL timeout setting.
  4. Extend tutel.examples.helloworld_custom_expert to explain the way to override customized expert layers.
How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.2.tar.gz
tutel - Tutel v0.3.1

Published by ghostplant 10 months ago

What's New in v0.3.1:

  1. Enable 2 additional collective communication primitives: net.batch_all_to_all_v(), net.batch_all_gather_v().
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.3.1.tar.gz
tutel - Tutel v0.3.0

Published by ghostplant about 1 year ago

What's New in v0.3.0:

  1. Support Megablocks-style dMoE inference (see README.md for more information)
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.3.0.tar.gz
tutel - Tutel v0.2.1

Published by ghostplant over 1 year ago

What's New in v0.2.1:

  1. Support Switchable Parallelism with example tutel.examples.helloworld_switch.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.1.tar.gz
tutel - Tutel v0.2.0

Published by ghostplant about 2 years ago

What's New in v0.2.0:

  1. Support Windows Python3 + Torch Installation;
  2. Add examples to enable Tutel MoE in Fairseq;
  3. Refactor MoE Layer implementation, letting all features (e.g. top-X, overlap, parallel_type, capacity, ..) be able to change at different forward interations;
  4. New features: load_importance_loss, cosine router, inequivalent_tokens;
  5. Extend capacity_factor value that includes zero value and negative values for smarter capacity estimation;
  6. Add tutel.checkpoint conversion tools to reformat checkpoint files, making it able to use existing checkpoints to train/infer with a different world size.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.0.tar.gz
tutel - Tutel v0.1.5

Published by ghostplant over 2 years ago

What's New in v0.1.5:

  1. Add 2D hierarchical a2a algorithm used for extremely-large scaling;
  2. Support different parallel_type for MoE computation: data, model, auto;
  3. Combine different expert granularities (e.g. normal, sharded experts, megatron dense ffn) into same programming interface & style;
  4. New features: is_postscore to specify whether gating scores are weighed during encoding or decoding;
  5. Enhance existing features: JIT compiler, a2a overlap with 2D.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.5.tar.gz

Contributors: @abuccts, @yzygitzh, @ghostplant, @EricWangCN

tutel - Tutel v0.1.4

Published by ghostplant over 2 years ago

What's New in v0.1.4:

  1. Enhance communication features: a2a overlap with computation, support different granularity of group creation, etc.
  2. Add single-thread CPU implementation for correctness check & reference;
  3. Refine JIT compiler interface for flexible usability: jit::inject_source && jit::jit_execute;
  4. Enhance examples: fp64 support, cuda amp, checkpointing, etc.
  5. Support execution inside torch.distributed.pipeline.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.4.tar.gz

Contributors: @yzygitzh, @ghostplant, @EricWangCN

tutel - Tutel v0.1.3

Published by ghostplant almost 3 years ago

What's New in v0.1.3:

  1. Add Tutel Launcher Support based on Open MPI;
  2. Support Establishing Data Model Parallel in Initialization;
  3. Support Single Expert Evenly Sharded on Multiple GPUs;
  4. Support List of Gates and Forwarding MoE Layer with Specified Gating Index;
  5. Fix NVRTC Compatibility when Enabling USE_NVRTC=1;
  6. Other Implementation Enhancements & Correctness Checking;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.3.tar.gz

Contributors: @ghostplant, @EricWangCN, @guoshzhao.

tutel - Tutel v0.1.2

Published by ghostplant almost 3 years ago

What's New in v0.1.2:

  1. General-purpose top-k gating with {'type': 'top', 'k': 2};
  2. Add Megatron-ML Tensor Parallel as gating type;
  3. Add deepspeed-based & megatron-based helloworld example for fair comparison;
  4. Add torch.bfloat16 datatype support for single-GPU;
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.2.tar.gz

Contributors: @ghostplant, @EricWangCN, @foreveronehundred.

tutel - Tutel v0.1.1

Published by ghostplant about 3 years ago

What's New in v0.1.1:

  1. Enable fp16 support for AMDGPU.
  2. Using NVRTC for JIT compilation if available.
  3. Add new system_init interface for initializing NUMA settings in distributed GPUs.
  4. Extend more gating types: Top3Gate & Top4Gate.
  5. Allow high level to change capacity value in Tutel fast dispatcher.
  6. Add custom AllToAll extension for old Pytorch version without builtin AllToAll operator support.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.1.tar.gz

Contributors: @jspark1105 , @ngoyal2707 , @guoshzhao, @ghostplant .

tutel - Tutel v0.1.0

Published by ghostplant about 3 years ago

The first version of Tutel for efficient MoE implementation.

How to setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.0.tar.gz