transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

APACHE-2.0 License

Stars
1.6K

Bot releases are hidden (Show)

transformer-deploy - Add GPT-2 acceleration support Latest Release

Published by pommedeterresautee over 2 years ago

  • add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
  • refactor triton configuration generation (simplification)
  • add GPT-2 model documentation (notebook)
  • fix CPU quantization benchmark (was not using the quant model)
  • fix sentence transformers bug
transformer-deploy - add CPU support and generic GPU quantization support

Published by pommedeterresautee almost 3 years ago

What's Changed

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.2.0...v0.3.0

transformer-deploy - add GPU quantization support

Published by pommedeterresautee almost 3 years ago

  • support int-8 GPU quantization
  • add a tuto to perform quantization end to end
  • add QDQRoberta model
  • switch to ONNX opset 13
  • refactoring in the TensorRT engine creation
  • fix bugs
  • add auth token (for private HF repo)

What's Changed

New Contributors

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.1.1...v0.2.0

transformer-deploy - update Triton image to 21.11-py3

Published by pommedeterresautee almost 3 years ago

  • update Docker image
  • update documentation
transformer-deploy - from PoC to library

Published by pommedeterresautee almost 3 years ago

  • switch from a proof of concept to a library
  • add support for TensorRT Python API (for best performances)
  • improve documentation (separate Hugging Face Infinity thing from the doc, add benchmark, etc.)
  • fix issues with mixed precision
  • add license
  • add tests, Github actions, Makefile
  • change the way the Docker image is built
transformer-deploy - first release

Published by pommedeterresautee almost 3 years ago

all the scripts to reproduce https://medium.com/p/e1be0057a51c