https://github.com/LambdaLabsML/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

MIT License

Stars

190

View Code on GitHub

Ecosystems: Cuda, PyTorch

Distributed Training Guide

This guide aims at a comprehensive guide on best practices for distributed training, diagnosing errors, and fully utilize all resources available.

Questions this guide answers:

How do I update a single gpu training/fine tuning script to run on multiple GPUs or multiple nodes?
How do I diagnose hanging/errors that happen during training?
My model/optimizer is too big for a single gpu - how do I train/fine tune it on my cluster?
How do I schedule/launch training on a cluster?
How do I scale my hyperparameters when increasing the number of workers?

Best practices for logging stdout/stderr and wandb are also included, as logging is vitally important in diagnosing/debugging training runs on a cluster.

How to read

This guide is organized into sequential chapters, each with a README.md and a train_llm.py script in them. The readme will discuss the changes introduced in that chapter, and go into more details.

Each of the training scripts is aimed at training a causal language model (i.e. gpt).

Set up

Clone this repo

git clone https://github.com/LambdaLabsML/distributed-training-guide.git

Virtual Environment

cd distributed-training-guide
python3 -m venv venv
source venv/bin/activate
python -m pip install -U pip
pip install -U setuptools wheel
pip install -r requirements.txt

wandb

This tutorial uses wandb as an experiment tracker.

wandb login

Related Projects

docker_image_with_cuda10_cudnn7

Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlo...

llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

28 Apr 2023 100

https://github.com/js1010/cusim

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

nvidia-gpu-ml-library-test

Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being p...

https://github.com/MrNeRF/gaussian-splatting-cuda

3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!

30 Jul 2023 862

QuickCluster

A KMeans implemented in C++ with Python bindings and GPU acceleration

https://github.com/neoheartbeats/neoheartbeats-kernel

An architecture for LLMs' continual-learning and long-term memories

watsor

Object detection for video surveillance

20 Jun 2020 244

https://github.com/mind/wheels

Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)

27 May 2017 886

EasyAI

Make your own AI easily !

osrs_yolov5

Yolov5 Object Detection In OSRS using Python code, Detecting Cows - Botting

https://github.com/clara-parabricks/GenomeWorks

SDK for GPU accelerated genome assembly and analysis

31 May 2019 284

https://github.com/zpzim/SCAMP

The fastest way to compute matrix profiles on CPU and GPU!

02 Apr 2018 157

PyDyNet

NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)

efficient-dl-systems

Efficient Deep Learning Systems course materials (HSE, YSDA)

06 Dec 2021 651