tutel | PyTorch Ecosystem Directory

Bot releases are hidden (Show)

tutel - Tutel v0.3.2 Latest Release

Published by ghostplant 5 months ago

What's New in v0.3.2:

Add tutel.net.all_to_all_v & tutel.net.all_gather_v for dispatching messages of dynamic sizes.
Add --use_tensorcore option for benchmarking in tutel.examples.helloworld.
Read TUTEL_GLOBAL_TIMEOUT_SEC from environment variable to configure NCCL timeout setting.
Extend tutel.examples.helloworld_custom_expert to explain the way to override customized expert layers.

How to Setup:
python3 -m pip install -v -U --no-build-isolation https://github.com/microsoft/tutel/archive/refs/tags/v0.3.2.tar.gz

tutel - Tutel v0.3.1

Published by ghostplant 10 months ago

What's New in v0.3.1:

Enable 2 additional collective communication primitives: net.batch_all_to_all_v(), net.batch_all_gather_v().

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.3.1.tar.gz

tutel - Tutel v0.3.0

Published by ghostplant about 1 year ago

What's New in v0.3.0:

Support Megablocks-style dMoE inference (see README.md for more information)

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.3.0.tar.gz

tutel - Tutel v0.2.1

Published by ghostplant over 1 year ago

What's New in v0.2.1:

Support Switchable Parallelism with example tutel.examples.helloworld_switch.

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.1.tar.gz

tutel - Tutel v0.2.0

Published by ghostplant about 2 years ago

What's New in v0.2.0:

Support Windows Python3 + Torch Installation;
Add examples to enable Tutel MoE in Fairseq;
Refactor MoE Layer implementation, letting all features (e.g. top-X, overlap, parallel_type, capacity, ..) be able to change at different forward interations;
New features: load_importance_loss, cosine router, inequivalent_tokens;
Extend capacity_factor value that includes zero value and negative values for smarter capacity estimation;
Add tutel.checkpoint conversion tools to reformat checkpoint files, making it able to use existing checkpoints to train/infer with a different world size.

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.2.0.tar.gz

tutel - Tutel v0.1.5

Published by ghostplant over 2 years ago

What's New in v0.1.5:

Add 2D hierarchical a2a algorithm used for extremely-large scaling;
Support different parallel_type for MoE computation: data, model, auto;
Combine different expert granularities (e.g. normal, sharded experts, megatron dense ffn) into same programming interface & style;
New features: is_postscore to specify whether gating scores are weighed during encoding or decoding;
Enhance existing features: JIT compiler, a2a overlap with 2D.

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.5.tar.gz

Contributors: @abuccts, @yzygitzh, @ghostplant, @EricWangCN

tutel - Tutel v0.1.4

Published by ghostplant over 2 years ago

What's New in v0.1.4:

Enhance communication features: a2a overlap with computation, support different granularity of group creation, etc.
Add single-thread CPU implementation for correctness check & reference;
Refine JIT compiler interface for flexible usability: jit::inject_source && jit::jit_execute;
Enhance examples: fp64 support, cuda amp, checkpointing, etc.
Support execution inside torch.distributed.pipeline.

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.4.tar.gz

Contributors: @yzygitzh, @ghostplant, @EricWangCN

tutel - Tutel v0.1.3

Published by ghostplant almost 3 years ago

What's New in v0.1.3:

Add Tutel Launcher Support based on Open MPI;
Support Establishing Data Model Parallel in Initialization;
Support Single Expert Evenly Sharded on Multiple GPUs;
Support List of Gates and Forwarding MoE Layer with Specified Gating Index;
Fix NVRTC Compatibility when Enabling USE_NVRTC=1;
Other Implementation Enhancements & Correctness Checking;

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.3.tar.gz

Contributors: @ghostplant, @EricWangCN, @guoshzhao.

tutel - Tutel v0.1.2

Published by ghostplant almost 3 years ago

What's New in v0.1.2:

General-purpose top-k gating with {'type': 'top', 'k': 2};
Add Megatron-ML Tensor Parallel as gating type;
Add deepspeed-based & megatron-based helloworld example for fair comparison;
Add torch.bfloat16 datatype support for single-GPU;

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.2.tar.gz

Contributors: @ghostplant, @EricWangCN, @foreveronehundred.

tutel - Tutel v0.1.1

Published by ghostplant about 3 years ago

What's New in v0.1.1:

Enable fp16 support for AMDGPU.
Using NVRTC for JIT compilation if available.
Add new system_init interface for initializing NUMA settings in distributed GPUs.
Extend more gating types: Top3Gate & Top4Gate.
Allow high level to change capacity value in Tutel fast dispatcher.
Add custom AllToAll extension for old Pytorch version without builtin AllToAll operator support.

How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.1.tar.gz

Contributors: @jspark1105 , @ngoyal2707 , @guoshzhao, @ghostplant .

tutel - Tutel v0.1.0

Published by ghostplant about 3 years ago

The first version of Tutel for efficient MoE implementation.

How to setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.0.tar.gz

Package Rankings

Top 6.75% on Proxy.golang.org

Related Projects

mi-module-zoo

Common PyTorch Modules

08 Oct 2021 7

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

23 Mar 2022 1,851

mup

maximal update parametrization (µP)

02 Nov 2021 1,192