torchprep

Stars

9

View Code on GitHub View on X

Ecosystems: Python

Torchprep

A CLI tool to prepare your Pytorch models for efficient inference. The only prerequisite is a model trained and saved with torch.save(model_name, model_path). See example.py for an example.

Be warned: torchprep is an experimental tool so expect bugs, deprecations and limitations. That said if you like the project and would like to improve it please open up a Github issue!

Install from source

Create a virtual environment

apt-get install python3-venv
python3 -m venv venv
source venv/bin/activate

Install poetry

sudo python3 -m pip install -U pip
sudo python3 -m pip install -U setuptools
pip install poetry

Install torchprep

cd torchprep
poetry install

Install from Pypi

pip install torchprep

Usage

torchprep quantize --help

Example

# Install example dependencies
pip install torchvision transformers

# Download resnet and bert example
python tests/download_example.py

# quantize a cpu model with int8 on cpu and profile with a float tensor of shape [64,3,7,7]
torchprep quantize models/resnet152.pt int8

Profile

To profile a model you need to create a yaml file describing your model input shape. The YAML can accept multiple inputs

# restnet.yaml
input:
  dtype: "int8"
  device: "cpu"
  shape: [16, 3, 7, 7] # the first element is the batch size

Then you can pass in the yaml file to torchprep

# profile a model for a 100 iterations
torchprep profile models/resnet152.pt --iterations 100 --device cpu --input-shape config/resnet.yaml

# set omp threads to 1 to optimize cpu inference
torchprep env --device cpu

# Prune 30% of model weights
torchprep prune models/resnet152.pt --prune-amount 0.3

Available commands

Usage: torchprep [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.
  --help                Show this message and exit.

Commands:
  distill        Create a smaller student model by setting a distillation...
  prune          Zero out small model weights using l1 norm
  env-variables  Set environment variables for optimized inference.
  fuse           Supports optimizations including conv/bn fusion, dropout...
  profile        Profile model latency 
  quantize       Quantize a saved torch model to a lower precision float...

Usage instructions for a command

torchprep <command> --help

Usage: torchprep quantize [OPTIONS] MODEL_PATH PRECISION:{int8|float16}

  Quantize a saved torch model to a lower precision float format to reduce its
  size and latency

Arguments:
  MODEL_PATH                [required]
  PRECISION:{int8|float16}  [required]

Options:
  --device [cpu|gpu]  [default: Device.cpu]
  --input-shape TEXT  Comma separated input tensor shape
  --help              Show this message and exit.

Dev instructions

Run tests

pytest --disable-pytest-warnings

Create binaries

To create binaries and test them out locally

poetry build
pip install --user /path/to/wheel

Upload to Pypi

poetry config pypi-token.pypi <SECRET_KEY>
poetry publish --build

Roadmap

Supporting add custom model names and output paths
Support multiple input tensors for models like BERT that expect a batch size and sequence length
Support multiple input tensor types
Print environment variables
TensorRT
IPEX

Short term

Integrate into universal benchmark tool serve/benchmarks
Automatic distillation example: Reduce parameter count by 1/3 torchprep distill model.pt 1/3
Training aware optimizations

Medium term

Get model input shape with type annotations - solution exists in Python 3.11 only
Automated release with github actions - low priority for now

Related Projects

torchinfo

View model summaries in PyTorch!

16 Mar 2020 2,523

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

28 Dec 2022 32,417

tensorflow-qnd

Quick and Dirty TensorFlow command framework to train and evaluate models and make inference

starcoder

Home of StarCoder: fine-tuning & inference!

24 Apr 2023 7,267

pytorch-template

PyTorch deep learning projects made easy.

13 Mar 2018 4,714

diffusers-torchao

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 ...

05 Aug 2024 166

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

02 Jul 2021 1,323

nanotron

Minimalistic large language model 3D-parallelism training

11 Sep 2023 1,080

torchview

torchview: visualize pytorch models

08 Nov 2022 803

tensorflow-litterbox

Tensorflow experimentation sandbox. VGG, ResNet, Inception V3, Inception V4, Inception-Resnet-V2 ...

textsum

CLI & Python API to easily summarize text-based files with transformers

18 Dec 2022 122

Complex-YOLOv4-Pytorch

The PyTorch Implementation based on YOLOv4 of the paper: "Complex-YOLO: Real-time 3D Object Detec...

03 Jul 2020 1,234

make-sense-inference

Template https://makesense.ai inference server. Use your pre-trained model to automate annotation...

few-shot-open-set

Implementation of Open-Set Likelihood Maximization for Few-Shot Learning

mistral-finetune

24 May 2024 2,678