Utilities for PyTorch distributed
APACHE-2.0 License
Utilities for PyTorch distributed.
There is an example script to train a classifier on CIFAR-10 using DDP in examples/train_classifier.py
. It can be run on CPU or one GPU with:
python examples/train_classifier.py
and on multiple GPUs with:
torchrun --nproc-per-node gpu examples/train_classifier.py
torchrun_slurm
is a wrapper for torchrun
that can be used with Slurm. It is a drop-in replacement for torchrun
. It will automatically set the --nnodes
, --node-rank
, and --master-addr
arguments from $SLURM_NPROCS
, $SLURM_PROCID
, and the first node in $SLURM_JOB_NODELIST
respectively. It also sets --nproc-per-node
to the number of GPUs on the node (you can override it by setting it explicitly).
Test cases should run on CPU and GPU, with and without torchrun.
python -m torch_dist_utils.test --device-type cpu
python -m torch_dist_utils.test --device-type cuda
CUDA_VISIBLE_DEVICES="" torchrun --nproc-per-node 4 -m torch_dist_utils.test --device-type cpu
torchrun --nproc-per-node gpu -m torch_dist_utils.test --device-type cuda
Test cases should also run on multiple nodes. To simulate this on a single machine, run:
CUDA_VISIBLE_DEVICES="" torchrun --master-addr localhost --master-port 25500 --nnodes 2 --nproc-per-node 4 --node-rank 0 -m torch_dist_utils.test --device-type cpu
in one terminal, and
CUDA_VISIBLE_DEVICES="" torchrun --master-addr localhost --master-port 25500 --nnodes 2 --nproc-per-node 4 --node-rank 1 -m torch_dist_utils.test --device-type cpu
in another.