serve | Kubernetes Ecosystem Directory

Bot releases are hidden (Show)

serve - TorchServe v0.11.1 Release Notes Latest Release

Published by agunapal 3 months ago

This is the release of TorchServe v0.11.1.

Highlights Include

Security Updates
- Token Authorization: TorchServe enforces token authorization by default which requires the correct token to be provided when calling a HTTP/S or gRPC API. This is a security feature which addresses the concern of unauthorized API calls. This is applicable in the scenario where an unauthorized user may try to access a running TorchServe instance. The default behavior is to enable this feature which creates a key file with the appropriate tokens to be used for API calls. Users have the option to disable this feature to prevent token authorization from being required for API calls. For more details, refer to the token authorization documentation: https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md
- Model API Control: TorchServe disables the ability to register and delete models using HTTP/S or gRPC API calls by default once TorchServe is running. This is a security feature which addresses the concern of unintended registration and deletion of models once TorchServe has started. This is applicable in the scenario where a user may upload malicious code to the model server in the form of a model or where a user may delete a model that is being used. The default behavior prevents users from registering or deleting models once TorchServe is running. Model API control can be enabled to allow users to register and delete models using the TorchServe model load and delete APIs. For more details, refer to the model API control documentation: https://github.com/pytorch/serve/blob/master/docs/model_api_control.md
PyTorch 2.x updates
- Standardized torch.compile configuration
- Added examples for tensorrt & hpu backends
GenAI updates
- Support continuous batching in sequence batch streaming
- Asynchronous backend worker communication for continuous batching
- No code LLM deployment
Support for Intel GPUs

Security Updates

Adding model-control-mode by @udaij12 in https://github.com/pytorch/serve/pull/3165
Enable Token Authorization by default by @udaij12 in https://github.com/pytorch/serve/pull/3163
Updating night CIs to account for model control and token auth by @udaij12 in https://github.com/pytorch/serve/pull/3188
Adding token auth and model api to workflow and https by @udaij12 in https://github.com/pytorch/serve/pull/3234
Enable token authorization and model control for gRPC by @namannandan in https://github.com/pytorch/serve/pull/3238

PyTorch 2.x Updates

torch compile config standardization update by @agunapal in https://github.com/pytorch/serve/pull/3166
Token Authorization fixes by @udaij12 in https://github.com/pytorch/serve/pull/3192
Changing mar file for Bert torch compile by @udaij12 in https://github.com/pytorch/serve/pull/3175
Fixing torch compile benchmark by @udaij12 in https://github.com/pytorch/serve/pull/3179
Add support for hpu_backend and Resnet50 compile example by @wozna in https://github.com/pytorch/serve/pull/3182
Update image_classifier/densenet-161 to include torch.compile by @lzcemma in https://github.com/pytorch/serve/pull/3200
TensorRT example with torch.compile by @agunapal in https://github.com/pytorch/serve/pull/3203
Update documentation for vgg16 to use torch.compile by @ijkilchenko in https://github.com/pytorch/serve/pull/3211
BERT with torch.compile by @agunapal in https://github.com/pytorch/serve/pull/3201
T5 Translation with torch.compile & TensorRT backend by @agunapal in https://github.com/pytorch/serve/pull/3223
Adjust Resnet50 hpu example by @wozna in https://github.com/pytorch/serve/pull/3219

GenAI

Support continuous batching in sequence batch streaming case by @lxning in https://github.com/pytorch/serve/pull/3160
GPT-FAST-MIXTRAL-MOE integration by @alex-kharlamov in https://github.com/pytorch/serve/pull/3151
Fix sequence continuous batching close session race condition by @namannandan in https://github.com/pytorch/serve/pull/3198
Asynchronous worker communication and vllm integration by @mreso in https://github.com/pytorch/serve/pull/3146
Add single command LLM deployment by @mreso in https://github.com/pytorch/serve/pull/3209
TensorRT-LLM Engine integration by @agunapal in https://github.com/pytorch/serve/pull/3228
Adds torch.compile documentation to alexnet example readme by @crmdias in https://github.com/pytorch/serve/pull/3227

Support for Intel GPUs

Torchserve support for Intel GPUs by @krish-navulla in https://github.com/pytorch/serve/pull/3132
Torchserve Metrics support for Intel GPUs enabled by @krish-navulla in https://github.com/pytorch/serve/pull/3141

Documentation

Update supported TS version in security documentation by @namannandan in https://github.com/pytorch/serve/pull/3144
Update performance documentation by @agunapal in https://github.com/pytorch/serve/pull/3159
model archiver example to multi-line by @GeeCastro in https://github.com/pytorch/serve/pull/3155
fix broken llm deployment link by @msaroufim in https://github.com/pytorch/serve/pull/3214
Security documentation update by @udaij12 in https://github.com/pytorch/serve/pull/3183

Improvements and Bug Fixing

workaround for compile example failure by @agunapal in https://github.com/pytorch/serve/pull/3190
Fix Inf2 benchmark by @namannandan in https://github.com/pytorch/serve/pull/3177
Make a copy of the torchtext utils to remove dependency by @agunapal in https://github.com/pytorch/serve/pull/3076
Pinning setuptools version by @udaij12 in https://github.com/pytorch/serve/pull/3152
Fixing Regression test CI GPU and CPU by @udaij12 in https://github.com/pytorch/serve/pull/3147
Fixing docker CI by @udaij12 in https://github.com/pytorch/serve/pull/3194
Replace pkg_resources.packaging by @udaij12 in https://github.com/pytorch/serve/pull/3187
Kserve ci fix by @udaij12 in https://github.com/pytorch/serve/pull/3196
Benchmark numpy fix by @udaij12 in https://github.com/pytorch/serve/pull/3197
Add workflow dispatch trigger to nightly builds by @agunapal in https://github.com/pytorch/serve/pull/3250
Bug fix for kserve build issue and fixing nightly tests by @agunapal in https://github.com/pytorch/serve/pull/3251
Remove vllm dependency to not bloat docker image size by @agunapal in https://github.com/pytorch/serve/pull/3245
Kserve fix ray & setuptools dependency issue by @udaij12 in https://github.com/pytorch/serve/pull/3205
clean a jobGroup immediately when it finished by @lxning in https://github.com/pytorch/serve/pull/3222
Updating examples for security tags by @udaij12 in https://github.com/pytorch/serve/pull/3224
Fix/llm launcher disable token by @mreso in https://github.com/pytorch/serve/pull/3230
Example update by @udaij12 in https://github.com/pytorch/serve/pull/3231
Updating docker cuda and github branch by @udaij12 in https://github.com/pytorch/serve/pull/3233
Reduce severity of xpu-smi logging by @namannandan in https://github.com/pytorch/serve/pull/3239
Upgrade kserve dependencies by @agunapal in https://github.com/pytorch/serve/pull/3246
Fix/vllm dependency by @mreso in https://github.com/pytorch/serve/pull/3249
Copy remote branch entrypoint to compile and production image stages by @lanxih in https://github.com/pytorch/serve/pull/3213
Fix Condition Checking for Intel GPUs Enabling by @Kanya-Mo in https://github.com/pytorch/serve/pull/3220

New Contributors

@alex-kharlamov made their first contribution in https://github.com/pytorch/serve/pull/3151
@lzcemma made their first contribution in https://github.com/pytorch/serve/pull/3200
@wozna made their first contribution in https://github.com/pytorch/serve/pull/3182
@krish-navulla made their first contribution in https://github.com/pytorch/serve/pull/3132
@ijkilchenko made their first contribution in https://github.com/pytorch/serve/pull/3211
@lanxih made their first contribution in https://github.com/pytorch/serve/pull/3213
@Kanya-Mo made their first contribution in https://github.com/pytorch/serve/pull/3220
@crmdias made their first contribution in https://github.com/pytorch/serve/pull/3227

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.11.1	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.11.0	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.11.1	2.1	>=3.8, <=3.11	2.18.2+
0.11.0	2.1	>=3.8, <=3.11	2.18.2+
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

serve - TorchServe v0.11.0 Release Notes

Published by lxning 5 months ago

This is the release of TorchServe v0.11.0.

Highlights Include

GenAI inference optimizations showcasing
- torch.compile with OpenVINO backend for Stable Diffusion
- Intel IPEX for Llama
Experimental support for Apple MPS and linux-aarch64
Security bug fixing

GenAI

Upgraded LLama2 examples to Llama3
- Supported Llama3 in HuggingFace Accelerate Example #3108 @mreso
- Supported Llama3 in chat bot #3131 @mreso
- Supported Llama3 on inf2 Neuronx transformer using continuous batching or micro batching #3133 #3035 @lxning
Examples for LoRA and Mistral #3077 @lxning
IPEX LLM serving example with Intel AMX #3068 @bbhattar
Integration of Intel Openvino with TorchServe using torch.compile. Example showcase of openvino torch.compile backend with Stable Diffusion #3116 @suryasidd
Enabling retrieval of guaranteed sequential order of input sequences with low latency for stateful inference via HTTP extending this previously gRPC-only feature #3142 @lxning

Linux aarch64 Support:

TorchServe adds support for linux-aarch64 and shows an example working on AWS Graviton. This provides users with a new platform alternative for serving models on CPU.

Supported linux aarch64 with examples SpeechT5 #3071 @agunapal

Apple Silicon Support:

TorchServe now includes support MPS backend on apple silicon #3048 @udaij12 @agunapal
Added TorchServe quickstart chatbot example #3003 @agunapal

XGBoost Support:

With the XGBoost Classifier example, we show how to deploy any pickled model with TorchServe.

Added XGBoost Classifier Example #3088 @agunapal

Security

The ability to bypass allowed_urls using relative paths has been fixed by ensuring preemptive check for relative paths prior to copying the model archive to the model store directory. Also, the default gRPC inference and management addresses are now set to localhost(127.0.0.1) to reduce scope of default access to gRPC endpoints.

Fixed allowed_urls filter bypass #3082 @udaij12 @msaroufim
Fixed GRPC address assignment to localhost by default #3083 @namannandan

C++ Backend

Supported pure cmake build #3021 @mreso

Documentation

Updated SECURITY.md #3038, #3041, #3043, #3046 #3084 @msaroufim @diogoteles08 @udaij12 @lxning @namannandan
Updated PT2 examples readme #3029 @chauhang
Updated Resnet18 torch.compile readme #3130 @SimonTong22
Updated doc-automation.yml #3105 @svekars

Improvements and Bug Fixing

Supported PyTorch 2.3 #3109 @agunapal
Applied Jsonify customized metadata on management API #3059 @harshita-meena
Accepted empty version in GRPC management API #3095 @harshita-meena
Added test template #3140 @mreso
Logged entire stdout and stderr for terminated backend worker process #3036 @namannandan
Increased test timeout for test_handler_traceback_logging #3113 @namannandan
Supported gRPC max connection age configuration #3121 @namannandan
Updated deprecated TorchVision and PyTorch APIs #3074 @kit1980 @agunapal
Supported Installation from source for a specific branch with docker #3055 @agunapal
Workaround for kserve nightly failure #3079 @agunapal
Disabled mac arm64 tests #3057 @agunapal
Fixed CI and Regression workflows for MAC Arm64 #3128 @namannandan
Included missing model configuration values in describe model API response #3122 @namannandan

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.11.0	2.3.0	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.11.0	2.1	>=3.8, <=3.11	2.18.2+
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

serve - TorchServe v0.10.0 Release Notes

Published by lxning 7 months ago

This is the release of TorchServe v0.10.0.

Highlights include

Extended support for PyTorch 2.x inference
C++ backend
GenAI fast series torch.compile showcase examples
Token authentication support for enhanced security.

C++ Backend

TorchServe presented the experimental C++ backend at the PyTorch Conference 2022. Similar to the Python backend, C++ backend also runs as a process and utilizes the BaseHandler to define APIs for customizing the handler. By providing a backend and handler written in pure C++ for TorchServe, it is now possible to deploy PyTorch models without any Python overhead. This release officially promoted the experimental branch to the master branch and included additional examples and Docker images for development.

Refactored C++ backend branch and promoted it to master #2840 #2927 #2937 #2953 #2975 #2980 #2958 #3006 #3012 #3014 #3018 @mreso
C++ backend examples:
a. Example Baby Llama #2903 #2911 @shrinath-suresh @mreso
b. Example Llama2 #2904 @shrinath-suresh @mreso
C++ dev Docker for CPU and GPU #2976 #3015 @namannandan

torch.compile

With the launch of PT2 Inference at the PyTorch Conference 2023, we have added several key examples showcasing out-of-box speedups for torch.compile and AOT Compile. Since there is no new development being done in TorchScript, starting this release, TorchServe is preparing the migration path for customers to switch from TorchScript to torch.compile.

GenAI torch.compile series

The fast series GenAI models - GPTFast, SegmentAnythingFast, DiffusionFast with 3-10x speedups using torch.compile and native PyTorch optimizations:

Example GPT Fast #2815 #2834 #2935 @mreso and deployment with KServe #2966 #2895 @agunapal
Example Segment Anything Fast #2802 @agunapal
Example Diffusion Fast #2902 @agunapal

Cold start problem solution

To address cold start problems, there is an example included to show how torch._export.aot_load (experimental API) can be used to load a pre-compiled model. TorchServe has also started benchmarking models with torch.compile and tracking their performance compared to TorchScript.

The new TorchServe C++ backend also includes torch.compile and AOTInductor related examples for ResNet50, BERT and Llama2.

torch.compile
a. Example torch.compile with image classifier model densenet161 #2915 @agunapal
b. Example torch._export.aot_compile with image classification model ResNet-18 #2832 #2906 #2932 #2948 @agunapal
c. Example torch inductor fx graph caching with image classification model densenet161 #2925 @agunapal
C++ AOTInductor
a. Example AOT Inductor with Llama2 #2913 @mreso
b. Example AOT Inductor with ResNet-50 #2944 @lxning
c. Example AOT Inductor with BERTSequenceClassification #2931 @lxning

Gen AI

Supported sequence batching for stateful inference in gRPC bi-directional streaming #2513 @lxning
The fast series Gen AI models using torch.compile and native PyTorch optimizations.
Example Mistral 7B with vLLM #2781 @agunapal
Example PyTorch native tensor parallel with Llama2 with continuous batching #2709 @mreso @HamidShojanazeri
Supported inf2 Neuronx transformer continuous batching for both no coding style and advanced customers with Llama2-70B example #2803 #3016 @lxning
Example deepspeed mii fastgen with Llama2-13B #2779 @lxning

Security

TorchServe has implemented token authentication for management and inference APIs. This is an optional config and can be enabled using torchserve-endpoint-plugin. This plugin can be downloaded from maven. This further strengthens TorchServe’s capability as a secure model serving solution. The security features of TorchServe are documented here

Supported token authentication in management and inference APIs,
#2888 #2970 #3002 @udaij12

Apple Silicon Support

TorchServe is now supported on Apple Silicon mac. The current support is for CPU only. We have also posted an RFC for the deprecation of x86 mac support.

Include arm64 mac in CI workflows #2934 @udaij12
Conda binaries build support #3013 @udaij12
Adding support for regression tests for binaries #3019 @udaij12

KServe Updates

While serving large models, model loading can take some time even though the pod is running. Even though TorchServe is up, the worker is not ready till the model is loaded. To address this, TorchServe now sets the model ready status in KServe after the model has been loaded on workers. TorchServe also includes native open inference protocol support in gRPC. This is an experiment feature.

Supported native KServe open inference protocol in gRPC #2609 @andyi2it
Refactored TorchServe configuration in KServe #2995 @sgaist
Improved KServe protocol version handling #2957 @sgaist
Updated KServe test script to return model version #2973 @agunapal
Set model status using TorchServe API in KServe #1878 @byeongjokim
Supported no-archive model archiver in KServe #2839 @agunapal
How to deploy MNIST using KServe with minikube #2718 @agunapal
Changes to support no-model archive mode with KServe #2839 @agunpal

Metrics Updates

In order to extend backwards compatibility support for metrics, auto-detection of backend metrics enables the flexibility to publish custom model metrics without having to explicitly specify them in the metrics configuration file. Furthermore, a customized script to collect system metrics is also now supported.

Supported backend metrics auto-detection #2769 @namannandan
Fixed backend metrics backward compatible #2816 @namannandan
Supported customized system metrics script via config.properties #3000 @lxning

Improvements and Bug Fixing

Supported PyTorch 2.2.1 #2959 #2972 and Release version updated #3010 @agunapal
Enabled option of installing model's 3rd party dependency in Python virtual environment via model config yaml file #2910 #2946 #2954 @namannandan
Fixed worker auto recovery #2746 @mreso
Fixed worker thread write and flush incomplete #2833 @lxning
Fixed the priority of parameters defined in register curl vs model-config.yaml #2858 @lxning
Refactored sanity check with pytest #2221 @mreso
Fixed model state if runtime is null from model archiver #2928 @mreso
Refactored benchmark script for LLM benchmark integration #2897 @mreso
Added pytest for tensor parallel #2741 @mreso
Fixed continuous batching unit test #2847 @mreso
Added separate pytest for send_intermediate_prediction_response #2896 @mreso
Fixed GPU ID in GPT Fast handler #2872 @sachanub
Added model archiver API #2751 @GeeCastro
Updated torch.compile in BaseHandler to accept kwargs via model config yaml file #2796 @eballesteros
Integrated pytorch-probot into the TorchServe #2725 @atalman
Added queue time in benchmark report #2854 @sachanub
Replaced no_grad with inference_mode in BaseHandler #2804 @bryant1410
Fixed env var CUDA_VERSION conflict in Dockerfile #2807 @rsbowman-striveworks
Fixed var USE_CUDA_VERSION in Dockerfile #2982 @fyang93
Fixed BASE_IMAGE for k8s docker image #2808 @rsbowman-striveworks
Fixed workflow store path in config.properties overwritten by the default workflow path #2792 @udaij12
Removed invalid warning log #2867 @lxning
Updated PyTorch nightly url and CPU version in install_dependency.py #2971 #3011 @agunapal
Deprecated Dockerfile.dev, build dev and prod docker image from single source Dockerfile #2782 @sachanub
Updated transformers version to >= 4.34.0 #2703 @agunapal
Fixed Neuronx requirements #2887 #2900 @namannandan
Added neuron SDK installation in install_dependencies.py #2893 @mreso
Updated ResNet-152 example output #2745 @sachanub
Clarified that "Not Accepted" is a valid classification in Huggingface_Transformers Sequence Classification example #2786 @nathanweeks
Added dead link checking in md files #2984 @mreso
Added comments in model_service_worker.py #2809 @InakiRaba91
Enabled a new github workflow or updated an existing workflow #2726 #2732 #2737 #2734 #2750 #2767 #2778 #2792 #2835 #2846 #2848 #2855 #2856 #2859 #2864 #2863 #2891 #2938 #2939 #2961 #2960 #2964 #3009 @agunapal @udaij12 @namannandan @sachanub

Documentation

Updated security readme #2773 #3020 @agunapal @udaij12
Added security readme to TorchServe site #2784 @sekyondaMeta
Refactor the README.md #2729 @chauhang
Updated git clone instruction in gRPC api documentation #2799 @bryant1410
Highlighted code in README #2805 @bryant1410
Fixed typos in the README.md #2806 #2871 @bryant1410 @rafijacSense
Fixed dead links in documentation #2936 @agunapal

Platform Support

Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support Matrix

TorchServe version	PyTorch version	Python	Stable CUDA	Experimental CUDA
0.10.0	2.2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.9.0	2.1	>=3.8, <=3.11	CUDA 11.8, CUDNN 8.7.0.84	CUDA 12.1, CUDNN 8.9.2.26
0.8.0	2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
0.7.0	1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96

Inferentia2 Support Matrix

TorchServe version	PyTorch version	Python	Neuron SDK
0.10.0	1.13	>=3.8, <=3.11	2.16+
0.9.0	1.13	>=3.8, <=3.11	2.13.2+

serve - TorchServe v0.9.0 Release Notes

Published by lxning about 1 year ago

This is the release of TorchServe v0.9.0.

Security

Our security process is documented here

We rely heavily on automation to improve the security of torchserve namely by

On a monthly basis updating our gradle and pip dependencies
Docker scanning via Snyk
Code analysis via CodeQL

A key point to remember is that torchserve will allow you to configure things in an unsecure way so make sure to read our security docs and relevant security warnings to make sure your product is secure in production. In general we do not encourage you to download untrusted mar files from the internet, running a .mar file effectively is running arbitrary python code so make sure to unzip mar files and validate whether they are doing anything suspicious.

Code scanning fixes

Used Sha-256 in ziputils #2629 @msaroufim
Verified default hostname in Test #2631 @msaroufim
Fixed zip slip error #2634 @msaroufim
Used string array as Process arguments input #2632 #2635 @msaroufim
Enabled Netty HTTP header validation as default #2630 @msaroufim
Verified 3rd party package installation path #2687 @lxning
Allowed url validation #2685 @lxning including

Disabled loading TS_ALLOWED_URLS from env by default.
Moved the model url validation to last step.
Sanity check model archive name to guard Uncontrolled data used in path expression

Address configuration updates

Updated default address from 0.0.0.0 to 127.0.0.1 #2624 #2704 @namannandan @agunapal
Bind container ports to localhost ports #2646 @namannandan

Documentation improvements

Updated security readme #2643 #2690 @msaroufim @agunapal
Updated security guidance in docker readme #2669 @agunapal

Dependency improvements

Created dependabot.yml #2642 #2675 @msaroufim
Bumped packaging from 23.1 to 23.2
Bumped pygit2 from 1.21.1 to 1.13.1
Bumped com.github.spotbugs from 4.0.2 to 5.1.3
Bumped ONNX from 1.14.0 to 1.14.1
Bumped Pillow from 9.3.0 to 10.0.1
Bumped Bump com.amazonaws:DynamoDBLocal from 1.13.2 to 2.0.0
Upgraded node to version 18 #2663 @agunapal

Blogs

New Features

Support PyTorch 2.1.0 and Python 3.11 #2621 #2691 #2697 @agunapal
Supported continous batching for single GPU LLM inference #2628 @mreso @lxning
Supported dynamically loading 3rd party package on SageMaker Multi-Model Endpoint #2535 @lxning
Added DALI handler to handle preprocess and updated Nvidia DALI example #2485 @jagadeeshi2i

New Examples

Deploy Llama2 on Inferentia2 #2458 @namannandan
Using TorchServe on SageMaker Inf2.24xlarge with Llama2-13B @lxning
PyTorch tensor parallel on Llama2 example #2623 #2689 @HamidShojanazeri
Enabled better transformer (ie. flash attention 2) on Llama2 #2700 @HamidShojanazeri @lxning
Llama2 Chatbot on Mac #2618 @agunapal
ASR speech recognition example #2047 @husenzhang

Improvements

Fixed typo in BaseHandler #2547 @a-ys
Create merge_queue workflow for CI #2548 @msaroufim
Fixed typo in artifact terminology unification #2551 @park12sj
Added env hints in model_service_worker #2540 @ZachOBrien
Refactor conda build scripts to publish all binaries #2561 @agunapal
Fixed response return type in KServe #2566 @jagadeeshi2i
Added torchserve-kfs nightly build #2574 @jagadeeshi2i
Added regression for all CPU binaries #2562 @agunapal
Updated CICD runners #2586 #2597 #2636 #2627 #2677 #2710 #2696 @agunapal @msaroufim
Upgraded newman version to 5.3.2 #2598 #2603 @agunapal
Updated opt benchmark config for inf2 #2617 @namannandan
Added ModelRequestEncoderTest #2580 @abergmeier
Added manually dispatch workflow #2686 @msaroufim
Updated test wheels with PyTorch 2.1.0 #2684 @agunapal
Allowed parallel level = 1 to run in torchrun mode #2608 @lxning
Fixed metric unit assignment backward compatibility #2693 @namannandan

Documentation

Updated MPS readme #2543 @sekyondaMeta
Updated large model inference readme #2542 @sekyondaMeta
Fixed bash snippets in examples/image_classifier/mnist/Docker.md #2345 @dmitsf
Fixed typo in kubernetes/autoscale.md #2393 @CandiedCode
Fixed path in examples/image_classifier/resnet_18/README.md #2568 @udaij12
Model Loading Guidance #2592 @agunapal
Updated Metrics readme #2560 @sekyondaMeta
Display nightly workflow status badge in README #2619 #2666 @agunapal @msaroufim
Update torch.compile information in examples/pt2/README.md #2706 @agunapal
Deploy model using TorchServe on SageMaker tutorial @lxning

Platform Support

Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 2.1.0 + Cuda 11.8, 12.1
Torch 2.0.1 + Cuda 11.7
Torch 2.0.0 + Cuda 11.7
Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.8.2 Release Notes

Published by lxning about 1 year ago

This is the release of TorchServe v0.8.2.

Security

Updated snakeyaml version to v2 #2523 @nskool
Added warning about model allowed urls when default value is applied #2534 @namannandan

Custom metrics backwards compatibility

add_metric is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to be COUNTER. If the metric is of a different type, it will need to be specified in the call to add_metric as follows:
metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
When upgrading from versions [v0.6.1 - v0.8.1] to v0.8.2, replace the call to add_metric with add_metric_to_cache.
All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown here.
A detailed upgrade guide is included in the metrics documentation.

New Features

Supported KServe GPRC v2 #2176 @jagadeeshi2i
Supported K8S session affinity #2519 @jagadeeshi2i

New Examples

Example LLama v2 70B chat using HuggingFace Accelerate #2494 @lxning @HamidShojanazeri @agunapal
large model example OPT-6.7B on Inferentia2 #2399 @namannandan
- This example demonstrates how NeuronX compiles the model , detects neuron core availability and runs the inference.
DeepSpeed deferred init with OPT-30B #2419 @agunapal
- This PR added feature deferred model init in OPT-30B example by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.
Torch TensorRT example #2483 @agunapal
- This PR uses Resnet-50 as an example to demonstrate Torch TensorRT.
K8S mnist example using minikube #2323 @agunapal
- This example shows how to use a pre-trained custom MNIST model to performing real time Digit recognition via K8S.
Example for custom metrics #2516 @namannandan
Example for object detection with ultralytics YOLO v8 model #2508 @agunapal

Improvements

Migrated publishing torchserve-plugins-sdk from Maven JCenter to Maven Central #2429 #2422 @namannandan
Fixed download model from S3 presigned URL #2416 @namannandan
Enabled opt-6.7b benchmark on inf2 #2400 @namannandan
Added job Queue Status in describe API #2464 @namannandan
Added add_metric API to be backward compatible #2525 @namannandan
Upgraded nvidia base image version to nvidia/cuda:11.7.1-base-ubuntu20.04 in GPU docker image #2442 @agunapal
Added Docker regression tests in CI #2403 @agunapal
Updated release version #2533 @agunapal
Upgraded default cuda to 11.8 in docker image build #2489 @agunapal
Updated docker nightly build parameters #2493 @agunapal
Added path to save ab benchmark profile graph in benchmark report #2451 @agunapal
Added profile information for benchmark #2470 @agunapal
Fixed manifest null in base handler #2488 @pedrogengo
Fixed batching input in DALI example #2455 @jagadeeshi2i
Fixed metrcis for K8S setup #2473 @jagadeeshi2i
Fixed kserve storage optional package in Dockerfile #2537 @jagadeeshi2i
Fixed typo in ModelConfig.java comments #2506 @arnavmehta7
Fixed netty direct buffer issues in torchserve-plugins-sdk #2511 @marrodion
Fixed typo in ts/context.py comments #2536 @ethankim00
Fixed Server error when gRPC client close connection unexpectedly #2420 @lxning

Documentation

Updated large model documentation #2468 @sekyondaMeta
Updated Sphinx landing page and requirements #2428 #2520 @sekyondaMeta
Updated G analytics in docs #2449 @sekyondaMeta
Added performance checklist in docs #2526 @sekyondaMeta
Added performance guidance in FAQ #2524 @sekyondaMeta
Added instruction for embedding handler examples #2431 @sidharthrajaram
Updated PyPi description #2445 @bryanwweber @agunapal
Updated Better Transformer README #2474 @HamidShojanazeri
Fixed typo in microbatching README #2484 @InakiRaba91
Fixed broken link in kubernetes AKS README #2490 @agunapal
Fixed lint error #2497 @ankithagunapal
Updated instructions for building GPU docker image for ONNX #2435 @agunapal

Platform Support

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.8.1 Release Notes

Published by lxning over 1 year ago

This is the release of TorchServe v0.8.1.

New Features

Supported microbatch in handler to parallel process a batch request from frontend. #2210 @mreso

Because pre- and post- processing are often carried out on the CPU the GPU sits idle until the two CPU bound steps are executed and the worker receives a new batch. Microbatch in handler is able to parallel process inference, pre- and post- processing for a batch request from frontend.

Supported job ticket #2350 @lxning

This feature help with use cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. Applications can take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements.

Supported job queue size configuration per model #2350 @lxning

New Examples

Notebook example of TorchServe on SageMaker MME(multiple model endpoint). @lxning

This example demonstrates creative content assisted by generative AI by using TorchServe on SageMaker MME.

Improvements

Upgraded to PyTorch 2.0.1 #2374 @namannandan

Significant reduction in Docker Image Size

Reduce GPU docker image size by 3GB #2392 @agunapal

Reduced dependency installation time and decrease docker image size #2364 @mreso

  GPU
  pytorch/torchserve   0.8.1-gpu   04eef250c14e   4 hours ago     2.34GB
  pytorch/torchserve   0.8.0-gpu   516bb13a3649   4 weeks ago     5.86GB
  pytorch/torchserve   0.6.0-gpu   fb6d4b85847d   12 months ago   2.13GB

  CPU
  pytorch/torchserve   0.8.1-cpu   68a3fcae81af   4 hours ago     662MB
  pytorch/torchserve   0.8.0-cpu   958ef6dacea2   4 weeks ago     2.37GB
  pytorch/torchserve   0.6.0-cpu   af91330a97bd   12 months ago   496MB

Updated CPU information for IPEX #2372 @min-jean-cho
Fixed inf2 example handler #2378 @namannandan
Added inf2 nightly benchmark #2283 @namannandan
Fixed archiver tgz format model directory structure mismatch on SageMaker #2405 @lxning
Fixed model archiver to fail if extra files are missing #2212 @mreso
Fixed device type setting in model config yaml #2408 @lxning
Fixed batchsize in config.properties not honored #2382 @lxning
Upgraded torchrun argument names and fixed backend tcp port connection #2377 @lxning
Fixed error thrown while loading multiple models in KServe #2235 @jagadeeshi2i
Fixed KServe fastapi migration issues #2175 @jagadeeshi2i
Added type annotation in model_server.py #2384 @josephcalise
Speed up unit test by removing sleep in start/stop torchserve #2383 @mreso
Removed cu118 from regression tests #2380 @agunapal
Enabled ONNX CI test #2363 @msaroufim
Removed session_mocker usage to prevent test cross talking #2375 @mreso
Enabled regression test in CI #2370 @msaroufim
Fixed regression test failures #2371 @namannandan
Bump up transformers version from 4.28.1 to 4.30.0 #2410

Documentation

Fixed links in FAQ #2351 @sekyondaMeta
Fixed broken links in index.md #2329 @sekyondaMeta

Platform Support

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.8.0 Release Notes

Published by lxning over 1 year ago

This is the release of TorchServe v0.8.0.

New Features

Supported large model inference in distributed environment #2193 #2320 #2209 #2215 #2310 #2218 @lxning @HamidShojanazeri

TorchServe added the deep integration to support large model inference. It provides PyTorch native large model inference solution by integrating PiPPy. It also provides the flexibility and extensibility to support other popular libraries such as Microsoft Deepspeed, and HuggingFace Accelerate.

Supported streaming response for GRPC #2186 and HTTP #2233 @lxning

To improve UX in Generative AI inference, TorchServe allows for sending intermediate token response to client side by supporting GRPC server side streaming and HTTP 1.1 chunked encoding .

Supported PyTorch/XLA on GPU and TPU #2182 @morgandu

By leveraging torch.compile it's now possible to run torchserve using XLA which is optimized for both GPU and TPU deployments.

Implemented New Metrics platform #2199 #2190 #2165 @namannandan @lxning

TorchServe fully supports metrics in Prometheus mode or Log mode. Both frontend and backend metrics can be configured in a central metrics YAML file.

Supported map based model config YAML file. #2193 @lxning

Added config-file option for model config to model archiver tool. Users is able to flexibly define customized parameters in this YAML file, and easily access them in backend handler via variable context.model_yaml_config. This new feature also made TorchServe easily support the other new features and enhancements.

Refactored PT2.0 support #2222 @msaroufim

We've refactored our model optimization utilities, improved logging to help debug compilation issues. We've also now deprecated compile.json in favor of using the new YAML config format, follow our guide here to learn more https://github.com/pytorch/serve/blob/master/examples/pt2/README.md the main difference is while archiving a model instead of passing in compile.json via --extra-files we can pass in a --config-file model_config.yaml

Supported user specified gpu deviceIds for a model #2193 @lxning

By default, TorchServe uses a round-robin algorithm to assign GPUs to a worker on a host. Starting from v0.8.0, TorchServe allows users to define deviceIds in the model_config.yaml. to assign GPUs to a model.

Supported cpu model on a GPU host #2193 @lxning

TorchServe supports hybrid mode on a GPU host. Users are able to define deviceType in model config YAML file to deploy a model on CPU of a GPU host.

Supported Client Timeout #2267 @lxning

TorchServe allows users to define clientTimeoutInMills in a model config YAML file. TorchServe calculates the expired timestamp of an incoming inference request if clientTimeoutInMills is set, and drops the request once it is expired.

Updated ping endpoint default behavior #2254 @lxning

Supported maxRetryTimeoutInSec, which defines the max maximum time window of recovering a dead backend worker of a model, in model config YAML file. The default value is 5 min. Users are able to adjust it in model config YAML file. The ping endpoint returns 200 if all models have enough healthy workers (ie, equal or larger the minWorkers); otherwise returns 500.

New Examples

Example of Pippy onboarding Open platform framework for distributed model inference #2215 @HamidShojanazeri
Example of DeepSpeed onboarding Open platform framework for distributed model inference #2218 @lxning
Example of Stable diffusion v2 #2009 @jagadeeshi2i

Improvements

Upgraded to PyTorch 2.0 #2194 @agunapal
Enabled Core pinning in CPU nightly benchmark #2166 #2237 @min-jean-cho

TorchServe can be used with Intel® Extension for PyTorch* to give performance boost on Intel hardware. Intel® Extension for PyTorch* is a Python package extending PyTorch with up-to-date features optimizations that take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI), Intel® Advanced Matrix Extensions (Intel® AMX), and more.

dashboard

Enabling core pinning in TorchServe CPU nightly benchmark shows significant performance speedup. This feature is implemented via a script under PyTorch Xeon backend, initiated from Intel® Extension for PyTorch*. To try out core pinning on your workload, add cpu_launcher_enable=true in config.properties.

To try out more optimizations with Intel® Extension for PyTorch*, install Intel® Extension for PyTorch* and add ipex_enable=true in config.properties.

Added Neuron nightly benchmark dashboard #2171 #2167 @namannandan
Enabled torch.compile support for torch 2.0.0 pre-release #2256 @morgandu
Fixed torch.compile mac regression test #2250 @msaroufim
Added configuration option to disable system metrics #2104 @namannandan
Added regression test cases for SageMaker MME contract #2200 @agunapal

In case of OOM , return error code 507 instead of generic code 503

Fixed Error thrown in KServe while loading multi-models #2235 @jagadeeshi2i
Added Docker CI for TorchServe #2226 @fabridamicelli
Change docker image release from dev to production #2227 @agunapal
Supported building docker images with specified Python version #2154 @agunapal
Model archiver optimizations:

a). Added wildcard file search in model archiver --extra-file #2142 @gustavhartz
b). Added zip-store option to model archiver tool #2196 @mreso
c). Made model archiver tests runnable from any directory #2191 @mreso
d). Supported tgz format model decompression in TorchServe frontend #2214 @lxning

Enabled batch processing in example scripted tokenizer #2130 @mreso
Made handler tests callable with pytest #2173 @mreso
Refactored sanity tests #2219 @mreso
Improved benchmark tool #2228 and added auto-validation # 2144 #2157 @agunapal

Automatically flag deviation of metrics from the average of last 30 runs

Added notification for CI jobs' (benchmark, regression test) failure @agunapal
Updated CI to run on ubuntu 20.04 #2153 @agunapal
Added github code scanning codeql.yml #2149 @msaroufim
freeze pynvml version to avoid crash in nvgpu #2138 @mreso
Made pre-commit usage clearer in error message #2241 and upgraded isort version #2132 @msaroufim

Dependency Upgrades

Documentation

Nvidia MPS integration study #2205 @mreso

This study compares TPS b/w TorchServe with Nvidia MPS enabled and TorchServe without Nvidia MPS enabled on P3 and G4. It can help to the decision in enabling MPS for your deployment or not.

Updated TorchServe page on pytorch.org #2243 @agunapal
Lint fixed broken windows Conda link #2240 @msaroufim
Corrected example PT2 doc #2244 @samils7
Fixed regex error in Configuration.md #2172 @mpoemsl
Fixed dead Kubectl links #2160 @msaroufim
Updated model file docs in example doc #2148 @tmc
Example for serving TorchServe using docker #2118 @agunapal
Updated walmart blog link #2117 @agunapal

Platform Support

GPU Support

Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.7.1 Release Notes

Published by lxning over 1 year ago

This is the release of TorchServe v0.7.1.

Security

Upgraded com.google.code.gson:gson from 2.10 to 2.10.1 in serving sdk - https://github.com/pytorch/serve/pull/2096 @snyk-bot
Upgraded ubuntu from 20.04 to rolling in Dockerfile files - https://github.com/pytorch/serve/pull/2066, https://github.com/pytorch/serve/pull/2065, https://github.com/pytorch/serve/pull/2064 @msaroufim
Update to safe snakeyaml, grpc and gradle - https://github.com/pytorch/serve/pull/2081 @jack-gits
Updated Dockerfile.dev to install gnupg before calling apt-key del 7fa2af80 - https://github.com/pytorch/serve/pull/2076 @yeahdongcn

Dependency Upgrades

Support PyTorch 1.13.1 - https://github.com/pytorch/serve/pull/2078 @agunapal

Improvements

Removed bad eval when onnx session used - https://github.com/pytorch/serve/pull/2034 @msaroufim
Updated runner label in regression_tests_gpu.yml - https://github.com/pytorch/serve/pull/2080 @lxning
Updated nightly benchmark config - https://github.com/pytorch/serve/pull/2092 @lxning

Documentation

Added TorchServe 2022 blogs in Readme - https://github.com/pytorch/serve/pull/2060 @msaroufim
The blogs are Torchserve Performance Tuning, Animated Drawings Case-Study, Walmart Search: Serving Models at a Scale on TorchServe, Scaling inference on CPU with TorchServe, and TorchServe C++ backend.
Fixed HuggingFace large model instruction - https://github.com/pytorch/serve/pull/2087 @HamidShojanazeri
Reworded examples Readme to highlight examples - https://github.com/pytorch/serve/pull/2086 @agunapal
Updated torchserve_on_win_native.md - https://github.com/pytorch/serve/pull/2050 @blackrabbit
Fixed typo in batch inference md - https://github.com/pytorch/serve/pull/2049 @MasoudKaviani

Deprecation

Deprecated future package and drop Python2 support - https://github.com/pytorch/serve/pull/2082 @namannandan

Platform Support

GPU Support

Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.7.0 Release Notes

Published by lxning almost 2 years ago

This is the release of TorchServe v0.7.0.

New Examples

HF + Better Transformer integration https://github.com/pytorch/serve/pull/2002 @HamidShojanazeri

Better Transformer / Flash Attention & Xformer Memory Efficient provides out of box performance with major speed ups for PyTorch Transformer encoders. This has been integrated into Torchserve HF Transformer example, please read more about this integration here.

Main speed ups in Better Transformers comes from exploiting sparsity on padded inputs and kernel fusions. As a result you would see the biggest gains when dealing with larger workloads, such sequences with longer paddings and larger batch sizes.

In our benchmarks on P3 instances with 4 V100 GPUs, using Torchserve benchmarking workloads, throughput has shown significant improvement with large batch sizes. 45.5% increase with batch size 8; 50.8% increase with batch size 16; 45.2% increase with batch size 32; 47.2% increase with batch size 64. and 17.2 increase with batch size 4. These number can vary based on your workload (batch size , padding percentage) and your hardware. Please look up some other benchmarks in the blog post.

torch.compile() support https://github.com/pytorch/serve/pull/1960 @msaroufim

We've added experimental support for PT 2.0 as in torch.compile() support within torchserve. To use it you need to supply a file compile.json when archiving your model to specify which backend you want. We've also enabled by default mode=reduce-overhead which is ideally suited for smaller batch sizes which are more common for inference. We recommend for now to leverage GPUs with tensor cores available like A10G or A100 since you're likely to see the greatest speedups there.

On training we've seen speedups ranging from 30% to 2x https://pytorch.org/get-started/pytorch-2.0/ but we haven't ran any performance benchmarks yet for inference. Until then we recommend you continue leveraging other runtimes like TensorRT or IPEX for accelerated inference which we highlight in our performance_guide.md. There are a few important caveats to consider when you're using torch.compile: changes in batch sizes will cause recompilations so make sure to leverage a small batch size, there will be additional overhead to start a model since you need to compile it first and you'll likely still see the largest speedups with TensorRT.

However, we hope that adding this support will make it easier for you to benchmark and try out PT 2.0. Learn more here https://github.com/pytorch/serve/tree/master/examples/pt2

Dependency Upgrades

Support Python 3.10 https://github.com/pytorch/serve/pull/2031 @agunapal
Support PyTorch 1.13 and Cuda 11.7 https://github.com/pytorch/serve/pull/1980 @agunapal
Update docker default from Ubuntu 18.04 to Ubuntu 20.04 (LTS) https://github.com/pytorch/serve/pull/1970 @LuigiCerone

Improvements

KFServe upgrade to 0.9 - https://github.com/pytorch/serve/issues/1860 @Jagadeesh
Added pyyaml for python venv https://github.com/pytorch/serve/pull/2014 @lxning
Added HG BERT better transformer benchmark https://github.com/pytorch/serve/issues/2024 @lxning

Documentation

Fixed response time unit https://github.com/pytorch/serve/pull/2015 @lxning

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.6.1 Release Notes

Published by lxning almost 2 years ago

This is the release of TorchServe v0.6.1.

New Features

Metrics Caching in Python backend - https://github.com/pytorch/serve/pull/1954 @maaquib @joshuaan7
ONNX models served via ORT runtime & docs for TensorRT https://github.com/pytorch/serve/pull/1857. @msaroufim
lPEX launcher core pinning https://github.com/pytorch/serve/pull/1401 . @min-jean-cho - to learn more https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html

New Examples

DLRM example via torchrec https://github.com/pytorch/serve/issues/1648 @mreso
Scriptable tokenizer example for text classification https://github.com/pytorch/serve/pull/1691 @mreso
Loading large Huggingface models by using accelerate https://github.com/pytorch/serve/pull/1933 @jagadeeshi2i
Stable diffusion Deepspeed MII example https://github.com/pytorch/serve/pull/1920 @jagadeeshi2i
HuggingFace diffuser example https://github.com/pytorch/serve/pull/1904 @jagadeeshi2i
On-premise near real-time video inference https://github.com/pytorch/serve/pull/1867 @agunapal
fsspec for large scale batch inference from cloud buckets https://github.com/pytorch/serve/pull/1927 @kirkpa
Torchdata example for unified training and inference preprocessing pipelines https://github.com/pytorch/serve/pull/1940 @PratsBhatt
Wav2Vec2 SpeechToText from Huggingface https://github.com/pytorch/serve/pull/1939 @altre

Dependency Upgrades

Support PyTorch 1.12 and Cuda 11.6 https://github.com/pytorch/serve/pull/1767 @lxning
Upgraded to JDK17 - https://github.com/pytorch/serve/issues/1619 @rohithkrn
Bumped gson version for security https://github.com/pytorch/serve/pull/1650 @lxning

Improvements

Optimized gRPC workflow performance https://github.com/pytorch/serve/pull/1854 for gRPC workflow. @lxning
Fixed worker shown as ready in DescribeModel endpoint before model is loaded https://github.com/pytorch/serve/issues/1679. @lxning
Gracefully handle decoding exceptions in python backend https://github.com/pytorch/serve/pull/1789 @msaroufim
Added handle OPTIONS in management API https://github.com/pytorch/serve/pull/1774 @xyang16
Fixed model status API in KServe https://github.com/pytorch/serve/pull/1773 @jagadeeshi2i
Fixed process verification in pid file - https://github.com/pytorch/serve/pull/1866 @rohithkrn
Updated Nvidia Waveglow/Tacotron2 https://github.com/pytorch/serve/pull/1905 @kbumsik
Added dev mode in install_from_src.py https://github.com/pytorch/serve/pull/1856 @msaroufim
Added the PV creation for K8 setup https://github.com/pytorch/serve/pull/1751 @jagadeeshi2i
Fixed volume permission in kubernetes setup https://github.com/pytorch/serve/pull/1747 @jagadeeshi2i
Upgraded hpa with v2beta2 api version https://github.com/pytorch/serve/pull/1760 @jagadeeshi2i
Fixed gradle deprecation method https://github.com/pytorch/serve/pull/1936 @lxning
Updated plugins/gradle.properties https://github.com/pytorch/serve/pull/1791 @liyaodev
Fixed pynvml import failure https://github.com/pytorch/serve/pull/1882 @lxning
Added pynvml exception management https://github.com/pytorch/serve/pull/1809 @lromor
Fixed an erroneous logging format string and pylint pragma https://github.com/pytorch/serve/pull/1630 @bradlarsen
Fixed broken path joins and unclosed files https://github.com/pytorch/serve/pull/1709 @DPeled

Build and CI

Added ubuntu 20.04 GPU in docker build - https://github.com/pytorch/serve/pull/1773 @msaroufim
Added spellchecking and link checking automation https://github.com/pytorch/serve/pull/1855 @sadra-barikbin
Added full release automation https://github.com/pytorch/serve/pull/1739 @msaroufim
Added workflow for pushing Conda nightly binaries https://github.com/pytorch/serve/pull/1685 @agunapal
Added code coverage https://github.com/pytorch/serve/pull/1665 in CI build @msaroufim
Unified documentation build dependencies https://github.com/pytorch/serve/pull/1759 @msaroufim
Added skipping spellcheck if no changed files https://github.com/pytorch/serve/pull/1919 for skipping spellcheck if no changed files. @maaquib
Added skipping flaky Java Windows test cases https://github.com/pytorch/serve/pull/1746 @msaroufim
Added alarm on failed github action https://github.com/pytorch/serve/pull/1781 @msaroufim

Documentation

Updated FAQ https://github.com/pytorch/serve/pull/1393 for how to decode international language @lxning
Improved KServe documentation https://github.com/pytorch/serve/pull/1807 @jagadeeshi2i
Updated [examples/intel_extension_for_pytorch/README.md https://github.com/pytorch/serve/pull/1816 @min-jean-cho
Fixed typos and dead links in doc.

Deprecations

Deprecated old ci/benchmark/buildspec.yml https://github.com/pytorch/serve/pull/1658 @lxning
Deprecated old docker/Dockerfile.neuron.dev https://github.com/pytorch/serve/pull/1775 in favor of AWS SageMaker DLC. @rohithkrn
Deprecated redundant LICENSE.txt https://github.com/pytorch/serve/pull/1801 @msaroufim

Platform Support

GPU Support

Torch 1.11+ Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.6.0 Release Notes

Published by lxning over 2 years ago

This is the release of TorchServe v0.6.0.

New Features

Support PyTorch 1.11 and Cuda 11.3 - Added support for PyTorch 1.11 and Cuda 11.3.
Universal Auto Benchmark and Dashboard Tool - Added one command line tool for model analyzer to get benchmark report(sample) and dashboard on any device.
HuggingFace model parallelism integration - Added example for HuggingFace model parallelism integration.

Build and CI

Added nightly benchmark dashboard - Added nightly benchmark dashboard.
Migrated CI, nightly binary and docker build to github workflow - Added CI, docker migration.
Fixed gpu regression test buildspec.yaml - Added fixing for gpu regression test buildspec.yaml.

Documentation

Updated documentation - Updated TorchServe, benchmark, snapshot and configuration documentation; fixed broken documentation build

Deprecations

Deprecated old benchmark/automated directory in favor of new Github Action based workflow

Improvements

Fixed workflow threads cleanup - Added fixing to clean workflow inference threadpool.
Fixed empty model url - Added fixing for empty model url in model archiver.
Fixed load model failure - Added support for loading a model directory.
HuggingFace text generation example - Added text generation example.
Updated metrics json and qlog format log - Added support for metrics json and qlog format log in log4j2.
Added cpu, gpu and memory usage - Added cpu, gpu and memory usage in benchmark-ab.py report.
Added exception for torch < 1.8.1 - Added exception to notify torch < 1.8.1.
Replaced hard code in install_dependencies.py - Added sys.executable in install_dependencies.py.
Added default envelope for workflow - Added default envelope in model manager for workflow.
Fixed multiple docker build errors - Fixed /home/venv write permission, typo in docker and added common requirements in docker.
Fixed snapshot test - Added fixing for snapshot test.
Updated model_zoo.md - Added dog breed, mmf and BERT in model zoo.
Added nvgpu in common requirements - Added nvgpu in common dependencies.
Fixed Inference API ping response - Fixed typo in Inference API ping response.

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above.

GPU Support

Torch 1.11+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.5.3 Release Notes

Published by lxning over 2 years ago

This is the release of TorchServe v0.5.3.

New Features

KServe V2 support - Added support for KServe V2 protocol.
Model customized metadata support - Extended managementAPI to support customized metadata from handler.

Improvements

Upgraded log4j2 version to 2.17.1 - Added log4j upgrade to address CVE-2021-44832.
Upgraded pillow to 9.0.0, python support upgraded to py3.8/py3.9 - Added docker, install dependency upgrade.
GPU utilization and GPU memory usage metrics support - Added support for GPU utilization and GPU memory usage metrics in benchmarks.
Workflow benchmark support - Added support for workflow benchmark.
benchmark-ab.py warmup support - Added support for warmup in benchmark-ab.py.
Multiple inputs for a model inference example - Added example to support multiple inputs for a model inference.
Documentation refactor - Improved documention.
Added API auto-discovery - Added support for API auto-discovery.
Nightly build support - Added support for Github action nightly build pip install torchserve-nightly

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above.

GPU Support

Torch 1.10+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

Planned Improvements

serve - TorchServe v0.5.2 Release Notes

Published by lxning almost 3 years ago

This is a hotfix release of Log4j issue.

Log4j Fixing

Upgrade log4j2 version to 2.17.0 - Added log4j upgrade to address CVE-2021-45105.

serve - TorchServe v0.5.1 Release Notes

Published by lxning almost 3 years ago

This is a hotfix release of Log4j issue.

Log4j Fixing

Upgrade log4j2 version to 2.16.0 - Added log4j upgrade to address CVE-2021-44228 and CVE-2021-45046.

New Features

IPEX launcher support - Added support for Intel extension for PyTorch.

serve - TorchServe v0.5.0 Release Notes

Published by lxning almost 3 years ago

This is the release of TorchServe v0.5.0.

New Features

PyTorch 1.10.0 support - TorchServe is now certified working with torch 1.10.0 torchvision 0.11.1, torchtext 0.11.0 and torchaudio 0.10.0
Kubernetes HPA support - Added support for Kubernetes HPA.
Faster transformer example - Added example for Faster transformer for optimized transformer model inference.
(experimental) torchprep support - Added experimental CLI tool to prepare Pytorch models for efficient inference.
Custom metrics example - Added example for custom metrics with mtail metrics exporter and Prometheus.
Reactjs example for Image Classifier - Added example for Reactjs Image Classifier.

Improvements

Batching inference exception support - Optimized batching to fix a concurrent modification exception that was occurring with batch inference.
k8s cluster creation support upgrade - Updated Kubernetes cluster creation scripts for v1.17 support.
Nvidia devices visibility support - Added support for NVIDIA devices visibility.
Large image support - Added support for PIL.Image.MAX_IMAGE_PIXELS.
Custom HTTP status support - Added support to return custom http status from a model handler.
TS_CONFIG_FILE env var support - Added support for setting TS_CONFIG_FILE as env var.
Frontend build optimization - Optimized frontend to reduce build times by 3.7x.
Warmup in benchmark - Added support for warmup in benchmark scripts.

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)

GPU Support

Torch 1.10+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2

serve - TorchServe v0.4.2 Release Notes

Published by lxning about 3 years ago

This is a hotfix release of TorchServe v0.4.2.

Improvements

Fixed the issue of port sharing between management and inference API
Fixed the issue of cleaning up tmp dir in model archiver
Fixed KFServing dockerfile

serve - TorchServe v0.4.1 Release Notes

Published by lxning about 3 years ago

This is the release of TorchServe v0.4.1.

New Features

PyTorch 1.9.0 support - TorchServe is now certified working with torch 1.9.0 torchvision 0.10.0, torchtext 0.10.0 and torchaudio 0.9.0
Model configuration support - Added support for model performance tuning on SageMaker via model configuration in config.properties.
Serialize config snapshots to DynamoDB - Added support for serializing config snapshots to DDB.
Prometheus metrics plugin support - Added support for Prometheus metrics plugin.
Kubeflow Pipelines support - Added support for Kubeflow pipelines and Google Vertex AI Manages pipelines, see examples here
KFServing docker support - Added production docker for KFServing.
Python 3.9 support - TorchServe is now certified working with Python 3.9.

Improvements

HF BERT models multiple GPU support - Added multi-gpu support for HuggingFace BERT models.
Error log for customer python package installation - Added support to log error of customer python package installation.
Workflow documentation optimization - Optimized workflow documentation.

Tooling improvements

Mar file automation integration - Integrated mar file generation automation into pytest and postman test.
Benchmark automation for AWS neuron support - Added support for AWS neuron benchmark automation.
Staging binary build support - Added support for staging binary build.

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)

GPU Support

Torch 1.9.0 + Cuda 10.2, 11.1
Torch 1.8.1 + Cuda 9.2, 10.1

serve - TorchServe v0.4.0 Release Notes

Published by lxning over 3 years ago

This is the release of TorchServe v0.4.0.

New Features

Workflow support - Added support for sequential and parallel ensemble models with Language Translation and Computer Vision classification examples.
S3 Model Store SSE support - Added support for S3 server side model encryption via KMS.
MMF-activity-recognition model example - Added example MMF-activity-recognition model
PyTorch 1.8.1 support - TorchServe is now certified working with torch 1.8.1, torchvision 0.9.1, torchtext 0.9.1, and torchaudio 0.8.1

Improvements

Fixed GPU memory high usage issue and updated model zoo - Fixed duplicate process on GPU device .
gRPC max_request_size support- Added support for gRPC max_request_size configuration in config.properties.
Non SSL request support - Added support for non SSL request.
Benchmark automation support - Added support for benchmark automation.
Support mar file generation automation - Added mar file generation automation.

Community Contributions

Fairseq NMT example - Added Fairseq Neural Machine Translation example (contributed by @AshwinChafale)
DeepLabV3 Image Segmentation example - Added DeepLabV3 Image Segmentation example (contributed by @alvarobartt)

Bug Fixes

Huggingface_Transformers model example - Fixed Captum explanations fails with HF models.

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)

GPU Support

Cuda 10.1, 10.2, 11.1

serve - TorchServe v0.3.1 Release Notes (Beta)

Published by dhanainme over 3 years ago

Patch release. Fixes Model Archiver to Recursively copy all artifacts

Make --serialized-file an Optional Argument #994
Recursively copy all files during archive #814

serve - TorchServe v0.3.0 Release Notes (Beta)

Published by maaquib almost 4 years ago

This is the release of TorchServe v0.3.0

Highlights:

Native windows support - Added support for TorchServe on Windows 10 pro and Windows Server 2019
KFServing Integration - Added support for v1 KFServing predict and explain APIs with auto-scaling and canary deployments for serving models in Kubeflow/KFServing
MLFlow-TorchServe: New MLflow TorchServe deployment plugin for serving models for MLflow MLOps lifecycle
Captum explanations - Added explain API for Captum model interpretability of different models
AKS Support - Added support for TorchServe deployment on Azure Kubernetes Service
GKE Support - Added support for TorchServe deployment on Google Kubernetes Service
gRPC support - Added support for gRPC based management and inference APIs
Request Envelopes - Added support for request envelopes which parses request from multiple Model serving frameworks like Seldon, KFServing, without any modifications in the handler code
PyTorch 1.7.1 support - TorchServe is now certified working with torch 1.7.1, torchvision 0.8.2, torchtext 0.8.1, and torchaudio 0.7.2
TorchServe Profiling - Added end-to-end profiling of inference requests. The time taken for different events by TorchServe for an inference request is captured in TorchServe metrics logs
Serving SDK - Release TorchServe Serving SDK 0.4.0 on maven with contracts/interfaces for Metric Endpoint plugin and Snapshot plugins
Naked DIR support - Added support for Model Archives as Naked DIRs with the --archive-format no-archive
Local file URL support - Added support for registering model through local file (file:///) URLs
Install dependencies - Added a more robust install dependency script certified across different OS platforms (Ubuntu 18.04, MacOS, Windows 10 Pro, Windows Server 2019)
Link Checker - Added link checker in sanity script to report any broken links in documentation
Enhanced model description - Added GPU usage info and worker PID in model description
FAQ guides - Added most frequently asked questions by community users
Troubleshooting guide - Added documentation for troubleshooting common problems related to model serving by TorchServe
Use case guide - Provides the reference use cases i.e. different ways in which TorchServe can be deployed for serving different types of PyTorch models

Other PRs since `v0.2.0`

Bug Fixes:

Fixed unbound variable issue while creating binaries from script #595
Fixed model latency calculation logic #630
Treat application/x-www-form-urlencoded as binary data #705
Fix socket.send does not guarantee that all data will be send. #765
Fixed bug in create_mar.sh script of Text_to_Speech_Synthesizer #704
Docker fixes #709 #724 #642 #823 #839 #853 #880
Unit and regression test fixes #774 #775 #827 #845 #858 #852
Install scripts fixes #798 #837 #844 #836
Benchmark fixes #768
Dependency fixes #757 #820
Temp path fixes #877 #638
Migrate model urls #697 #696 #695

Others

Added metrics endpoint to cfn templates and k8s setup #670 #747
Environment information header in regression and sanity suite #622 #865 #863
Documentation changes and fixes #754 #470 #816 #584 #872 #871 #879 #739
FairSeq language translation example #592
Additional regression tests for KFServing #855

Platform Support

Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)

Getting Started with TorchServe

Additionally, you can get started at https://pytorch.org/serve/ with installation instructions, tutorials and docs.
Lastly, if you have questions, please drop it into the PyTorch discussion forums using the ‘deployment’ tag or file an issue on GitHub with a way to reproduce.

Package Rankings

Top 1.11% on Pypi.org

Top 8.17% on Proxy.golang.org

Related Projects

deploy-hf-tf-vision-models

This repository shows various ways of deploying a vision model (TensorFlow) from 🤗 Transformers.

17 Jul 2022 29

kserve

Standardized Serverless ML Inference Platform on Kubernetes

27 Mar 2019 3,025

SUSE-openSUSE-Guide

SUSE/openSUSE Guide

09 Feb 2021 23