Bot releases are visible (Hide)
Published by laekov over 1 year ago
n_expert > 1
and more bug fixes.Published by laekov over 2 years ago
Published by laekov almost 3 years ago
mp_group
is renamed to slice_group
, indicating that all workers in the group receive the same input batch, and process a slice of the input. mp_group
will be deprecated in our next release.FMoELinear
is moved to a stand-alone file.has_loss
is added to each gate, in order to identify whether balance loss should be collected.mp_group
, instead of expert parallelism.MegatronMLP
.test_ddp.py
.Published by laekov about 3 years ago
USE_NCCL
by default.<1.8.0
and >=1.8.0
.Published by laekov over 3 years ago
Published by laekov over 3 years ago
Published by laekov over 3 years ago
First public release with basic distributed MoE functions, tested with Megatron-LM and Transformer-XL.