


A CLI tool to prepare your Pytorch models for efficient inference. The only prerequisite is a model trained and saved with, model_path). See for an example.

Be warned: torchprep is an experimental tool so expect bugs, deprecations and limitations. That said if you like the project and would like to improve it please open up a Github issue!

Install from source

Create a virtual environment

apt-get install python3-venv
python3 -m venv venv
source venv/bin/activate

Install poetry

sudo python3 -m pip install -U pip
sudo python3 -m pip install -U setuptools
pip install poetry

Install torchprep

cd torchprep
poetry install

Install from Pypi

pip install torchprep


torchprep quantize --help


# Install example dependencies
pip install torchvision transformers

# Download resnet and bert example
python tests/

# quantize a cpu model with int8 on cpu and profile with a float tensor of shape [64,3,7,7]
torchprep quantize models/ int8


To profile a model you need to create a yaml file describing your model input shape. The YAML can accept multiple inputs

# restnet.yaml
  dtype: "int8"
  device: "cpu"
  shape: [16, 3, 7, 7] # the first element is the batch size

Then you can pass in the yaml file to torchprep

# profile a model for a 100 iterations
torchprep profile models/ --iterations 100 --device cpu --input-shape config/resnet.yaml

# set omp threads to 1 to optimize cpu inference
torchprep env --device cpu

# Prune 30% of model weights
torchprep prune models/ --prune-amount 0.3

Available commands

Usage: torchprep [OPTIONS] COMMAND [ARGS]...

  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.
  --help                Show this message and exit.

  distill        Create a smaller student model by setting a distillation...
  prune          Zero out small model weights using l1 norm
  env-variables  Set environment variables for optimized inference.
  fuse           Supports optimizations including conv/bn fusion, dropout...
  profile        Profile model latency 
  quantize       Quantize a saved torch model to a lower precision float...

Usage instructions for a command

torchprep <command> --help

Usage: torchprep quantize [OPTIONS] MODEL_PATH PRECISION:{int8|float16}

  Quantize a saved torch model to a lower precision float format to reduce its
  size and latency

  MODEL_PATH                [required]
  PRECISION:{int8|float16}  [required]

  --device [cpu|gpu]  [default: Device.cpu]
  --input-shape TEXT  Comma separated input tensor shape
  --help              Show this message and exit.

Dev instructions

Run tests

pytest --disable-pytest-warnings

Create binaries

To create binaries and test them out locally

poetry build
pip install --user /path/to/wheel

Upload to Pypi

poetry config pypi-token.pypi <SECRET_KEY>
poetry publish --build


  • Supporting add custom model names and output paths
  • Support multiple input tensors for models like BERT that expect a batch size and sequence length
  • Support multiple input tensor types
  • Print environment variables
  • TensorRT
  • IPEX

Short term

  • Integrate into universal benchmark tool serve/benchmarks
  • Automatic distillation example: Reduce parameter count by 1/3 torchprep distill 1/3
  • Training aware optimizations

Medium term

Related Projects