Serve, optimize and scale PyTorch models in production
APACHE-2.0 License
Bot releases are hidden (Show)
This is the release of TorchServe v0.11.1.
torch.compile
configurationtensorrt
& hpu
backendsUbuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe requires Python >= 3.8 and JDK17.
TorchServe version | PyTorch version | Python | Stable CUDA | Experimental CUDA |
---|---|---|---|---|
0.11.1 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.11.0 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.10.0 | 2.2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.9.0 | 2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.8.0 | 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 |
0.7.0 | 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 |
TorchServe version | PyTorch version | Python | Neuron SDK |
---|---|---|---|
0.11.1 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.11.0 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.10.0 | 1.13 | >=3.8, <=3.11 | 2.16+ |
0.9.0 | 1.13 | >=3.8, <=3.11 | 2.13.2+ |
Published by lxning 5 months ago
This is the release of TorchServe v0.11.0.
torch.compile
with OpenVINO backend for Stable Diffusiontorch.compile
. Example showcase of openvino
torch.compile
backend with Stable Diffusion #3116 @suryasidd
TorchServe adds support for linux-aarch64 and shows an example working on AWS Graviton. This provides users with a new platform alternative for serving models on CPU.
With the XGBoost Classifier example, we show how to deploy any pickled model with TorchServe.
The ability to bypass allowed_urls using relative paths has been fixed by ensuring preemptive check for relative paths prior to copying the model archive to the model store directory. Also, the default gRPC inference and management addresses are now set to localhost(127.0.0.1) to reduce scope of default access to gRPC endpoints.
Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
TorchServe version | PyTorch version | Python | Stable CUDA | Experimental CUDA |
---|---|---|---|---|
0.11.0 | 2.3.0 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.10.0 | 2.2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.9.0 | 2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.8.0 | 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 |
0.7.0 | 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 |
TorchServe version | PyTorch version | Python | Neuron SDK |
---|---|---|---|
0.11.0 | 2.1 | >=3.8, <=3.11 | 2.18.2+ |
0.10.0 | 1.13 | >=3.8, <=3.11 | 2.16+ |
0.9.0 | 1.13 | >=3.8, <=3.11 | 2.13.2+ |
Published by lxning 7 months ago
This is the release of TorchServe v0.10.0.
Highlights include
torch.compile
showcase examplesTorchServe presented the experimental C++ backend at the PyTorch Conference 2022. Similar to the Python backend, C++ backend also runs as a process and utilizes the BaseHandler to define APIs for customizing the handler. By providing a backend and handler written in pure C++ for TorchServe, it is now possible to deploy PyTorch models without any Python overhead. This release officially promoted the experimental branch to the master branch and included additional examples and Docker images for development.
With the launch of PT2 Inference at the PyTorch Conference 2023, we have added several key examples showcasing out-of-box speedups for torch.compile
and AOT Compile. Since there is no new development being done in TorchScript, starting this release, TorchServe is preparing the migration path for customers to switch from TorchScript to torch.compile
.
The fast series GenAI models - GPTFast, SegmentAnythingFast, DiffusionFast with 3-10x speedups using torch.compile and native PyTorch optimizations:
To address cold start problems, there is an example included to show how torch._export.aot_load
(experimental API) can be used to load a pre-compiled model. TorchServe has also started benchmarking models with torch.compile
and tracking their performance compared to TorchScript.
The new TorchServe C++ backend also includes torch.compile and AOTInductor related examples for ResNet50, BERT and Llama2.
torch.compile
a. Example torch.compile
with image classifier model densenet161 #2915 @agunapal
b. Example torch._export.aot_compile
with image classification model ResNet-18 #2832 #2906 #2932 #2948 @agunapal
c. Example torch inductor fx graph caching with image classification model densenet161 #2925 @agunapal
C++ AOTInductor
a. Example AOT Inductor with Llama2 #2913 @mreso
b. Example AOT Inductor with ResNet-50 #2944 @lxning
c. Example AOT Inductor with BERTSequenceClassification #2931 @lxning
torch.compile
and native PyTorch optimizations.TorchServe has implemented token authentication for management and inference APIs. This is an optional config and can be enabled using torchserve-endpoint-plugin
. This plugin can be downloaded from maven. This further strengthens TorchServe’s capability as a secure model serving solution. The security features of TorchServe are documented here
TorchServe is now supported on Apple Silicon mac. The current support is for CPU only. We have also posted an RFC for the deprecation of x86 mac support.
While serving large models, model loading can take some time even though the pod is running. Even though TorchServe is up, the worker is not ready till the model is loaded. To address this, TorchServe now sets the model ready status in KServe after the model has been loaded on workers. TorchServe also includes native open inference protocol support in gRPC. This is an experiment feature.
In order to extend backwards compatibility support for metrics, auto-detection of backend metrics enables the flexibility to publish custom model metrics without having to explicitly specify them in the metrics configuration file. Furthermore, a customized script to collect system metrics is also now supported.
Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
TorchServe version | PyTorch version | Python | Stable CUDA | Experimental CUDA |
---|---|---|---|---|
0.10.0 | 2.2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.9.0 | 2.1 | >=3.8, <=3.11 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 |
0.8.0 | 2.0 | >=3.8, <=3.11 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 |
0.7.0 | 1.13 | >=3.7, <=3.10 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 |
TorchServe version | PyTorch version | Python | Neuron SDK |
---|---|---|---|
0.10.0 | 1.13 | >=3.8, <=3.11 | 2.16+ |
0.9.0 | 1.13 | >=3.8, <=3.11 | 2.13.2+ |
Published by lxning about 1 year ago
This is the release of TorchServe v0.9.0.
Our security process is documented here
We rely heavily on automation to improve the security of torchserve
namely by
gradle
and pip
dependenciesA key point to remember is that torchserve
will allow you to configure things in an unsecure way so make sure to read our security docs and relevant security warnings to make sure your product is secure in production. In general we do not encourage you to download untrusted mar files from the internet, running a .mar
file effectively is running arbitrary python code so make sure to unzip mar files and validate whether they are doing anything suspicious.
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 2.1.0 + Cuda 11.8, 12.1
Torch 2.0.1 + Cuda 11.7
Torch 2.0.0 + Cuda 11.7
Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning about 1 year ago
This is the release of TorchServe v0.8.2.
add_metric
is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to be COUNTER
. If the metric is of a different type, it will need to be specified in the call to add_metric
as follows:metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
add_metric
with add_metric_to_cache
.Example LLama v2 70B chat using HuggingFace Accelerate #2494 @lxning @HamidShojanazeri @agunapal
large model example OPT-6.7B on Inferentia2 #2399 @namannandan
DeepSpeed deferred init with OPT-30B #2419 @agunapal
deferred model init
in OPT-30B example by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.Torch TensorRT example #2483 @agunapal
K8S mnist example using minikube #2323 @agunapal
Example for custom metrics #2516 @namannandan
Example for object detection with ultralytics YOLO v8 model #2508 @agunapal
nvidia/cuda:11.7.1-base-ubuntu20.04
in GPU docker image #2442 @agunapalUbuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning over 1 year ago
This is the release of TorchServe v0.8.1.
Because pre- and post- processing are often carried out on the CPU the GPU sits idle until the two CPU bound steps are executed and the worker receives a new batch. Microbatch in handler is able to parallel process inference, pre- and post- processing for a batch request from frontend.
This feature help with use cases where inference latency can be high, such as generative models, auto regressive decoder models like chatGPT. Applications can take effective actions, for example, routing the rejected request to a different server, or scaling up model server capacity, based on the business requirements.
This example demonstrates creative content assisted by generative AI by using TorchServe on SageMaker MME.
Upgraded to PyTorch 2.0.1 #2374 @namannandan
Significant reduction in Docker Image Size
GPU
pytorch/torchserve 0.8.1-gpu 04eef250c14e 4 hours ago 2.34GB
pytorch/torchserve 0.8.0-gpu 516bb13a3649 4 weeks ago 5.86GB
pytorch/torchserve 0.6.0-gpu fb6d4b85847d 12 months ago 2.13GB
CPU
pytorch/torchserve 0.8.1-cpu 68a3fcae81af 4 hours ago 662MB
pytorch/torchserve 0.8.0-cpu 958ef6dacea2 4 weeks ago 2.37GB
pytorch/torchserve 0.6.0-cpu af91330a97bd 12 months ago 496MB
Updated CPU information for IPEX #2372 @min-jean-cho
Fixed inf2 example handler #2378 @namannandan
Added inf2 nightly benchmark #2283 @namannandan
Fixed archiver tgz format model directory structure mismatch on SageMaker #2405 @lxning
Fixed model archiver to fail if extra files are missing #2212 @mreso
Fixed device type setting in model config yaml #2408 @lxning
Fixed batchsize in config.properties not honored #2382 @lxning
Upgraded torchrun argument names and fixed backend tcp port connection #2377 @lxning
Fixed error thrown while loading multiple models in KServe #2235 @jagadeeshi2i
Fixed KServe fastapi migration issues #2175 @jagadeeshi2i
Added type annotation in model_server.py #2384 @josephcalise
Speed up unit test by removing sleep in start/stop torchserve #2383 @mreso
Removed cu118 from regression tests #2380 @agunapal
Enabled ONNX CI test #2363 @msaroufim
Removed session_mocker usage to prevent test cross talking #2375 @mreso
Enabled regression test in CI #2370 @msaroufim
Fixed regression test failures #2371 @namannandan
Bump up transformers version from 4.28.1 to 4.30.0 #2410
Fixed links in FAQ #2351 @sekyondaMeta
Fixed broken links in index.md #2329 @sekyondaMeta
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning over 1 year ago
This is the release of TorchServe v0.8.0.
TorchServe added the deep integration to support large model inference. It provides PyTorch native large model inference solution by integrating PiPPy. It also provides the flexibility and extensibility to support other popular libraries such as Microsoft Deepspeed, and HuggingFace Accelerate.
To improve UX in Generative AI inference, TorchServe allows for sending intermediate token response to client side by supporting GRPC server side streaming and HTTP 1.1 chunked encoding .
By leveraging torch.compile
it's now possible to run torchserve using XLA which is optimized for both GPU and TPU deployments.
TorchServe fully supports metrics in Prometheus mode or Log mode. Both frontend and backend metrics can be configured in a central metrics YAML file.
Added config-file option for model config to model archiver tool. Users is able to flexibly define customized parameters in this YAML file, and easily access them in backend handler via variable context.model_yaml_config. This new feature also made TorchServe easily support the other new features and enhancements.
We've refactored our model optimization utilities, improved logging to help debug compilation issues. We've also now deprecated compile.json
in favor of using the new YAML config format, follow our guide here to learn more https://github.com/pytorch/serve/blob/master/examples/pt2/README.md the main difference is while archiving a model instead of passing in compile.json
via --extra-files
we can pass in a --config-file model_config.yaml
By default, TorchServe uses a round-robin algorithm to assign GPUs to a worker on a host. Starting from v0.8.0, TorchServe allows users to define deviceIds in the model_config.yaml. to assign GPUs to a model.
TorchServe supports hybrid mode on a GPU host. Users are able to define deviceType in model config YAML file to deploy a model on CPU of a GPU host.
TorchServe allows users to define clientTimeoutInMills in a model config YAML file. TorchServe calculates the expired timestamp of an incoming inference request if clientTimeoutInMills is set, and drops the request once it is expired.
Supported maxRetryTimeoutInSec, which defines the max maximum time window of recovering a dead backend worker of a model, in model config YAML file. The default value is 5 min. Users are able to adjust it in model config YAML file. The ping endpoint returns 200 if all models have enough healthy workers (ie, equal or larger the minWorkers); otherwise returns 500.
Example of Pippy onboarding Open platform framework for distributed model inference #2215 @HamidShojanazeri
Example of DeepSpeed onboarding Open platform framework for distributed model inference #2218 @lxning
Example of Stable diffusion v2 #2009 @jagadeeshi2i
Upgraded to PyTorch 2.0 #2194 @agunapal
Enabled Core pinning in CPU nightly benchmark #2166 #2237 @min-jean-cho
TorchServe can be used with Intel® Extension for PyTorch* to give performance boost on Intel hardware. Intel® Extension for PyTorch* is a Python package extending PyTorch with up-to-date features optimizations that take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI), Intel® Advanced Matrix Extensions (Intel® AMX), and more.
Enabling core pinning in TorchServe CPU nightly benchmark shows significant performance speedup. This feature is implemented via a script under PyTorch Xeon backend, initiated from Intel® Extension for PyTorch*. To try out core pinning on your workload, add cpu_launcher_enable=true
in config.properties
.
To try out more optimizations with Intel® Extension for PyTorch*, install Intel® Extension for PyTorch* and add ipex_enable=true
in config.properties
.
In case of OOM , return error code 507 instead of generic code 503
Fixed Error thrown in KServe while loading multi-models #2235 @jagadeeshi2i
Added Docker CI for TorchServe #2226 @fabridamicelli
Change docker image release from dev to production #2227 @agunapal
Supported building docker images with specified Python version #2154 @agunapal
Model archiver optimizations:
a). Added wildcard file search in model archiver --extra-file #2142 @gustavhartz
b). Added zip-store option to model archiver tool #2196 @mreso
c). Made model archiver tests runnable from any directory #2191 @mreso
d). Supported tgz format model decompression in TorchServe frontend #2214 @lxning
Automatically flag deviation of metrics from the average of last 30 runs
This study compares TPS b/w TorchServe with Nvidia MPS enabled and TorchServe without Nvidia MPS enabled on P3 and G4. It can help to the decision in enabling MPS for your deployment or not.
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning over 1 year ago
This is the release of TorchServe v0.7.1.
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning almost 2 years ago
This is the release of TorchServe v0.7.0.
Better Transformer / Flash Attention & Xformer Memory Efficient provides out of box performance with major speed ups for PyTorch Transformer encoders. This has been integrated into Torchserve HF Transformer example, please read more about this integration here.
Main speed ups in Better Transformers comes from exploiting sparsity on padded inputs and kernel fusions. As a result you would see the biggest gains when dealing with larger workloads, such sequences with longer paddings and larger batch sizes.
In our benchmarks on P3 instances with 4 V100 GPUs, using Torchserve benchmarking workloads, throughput has shown significant improvement with large batch sizes. 45.5% increase with batch size 8; 50.8% increase with batch size 16; 45.2% increase with batch size 32; 47.2% increase with batch size 64. and 17.2 increase with batch size 4. These number can vary based on your workload (batch size , padding percentage) and your hardware. Please look up some other benchmarks in the blog post.
torch.compile()
support https://github.com/pytorch/serve/pull/1960 @msaroufimWe've added experimental support for PT 2.0 as in torch.compile() support within torchserve. To use it you need to supply a file compile.json
when archiving your model to specify which backend you want. We've also enabled by default mode=reduce-overhead
which is ideally suited for smaller batch sizes which are more common for inference. We recommend for now to leverage GPUs with tensor cores available like A10G or A100 since you're likely to see the greatest speedups there.
On training we've seen speedups ranging from 30% to 2x https://pytorch.org/get-started/pytorch-2.0/ but we haven't ran any performance benchmarks yet for inference. Until then we recommend you continue leveraging other runtimes like TensorRT or IPEX for accelerated inference which we highlight in our performance_guide.md
. There are a few important caveats to consider when you're using torch.compile: changes in batch sizes will cause recompilations so make sure to leverage a small batch size, there will be additional overhead to start a model since you need to compile it first and you'll likely still see the largest speedups with TensorRT.
However, we hope that adding this support will make it easier for you to benchmark and try out PT 2.0. Learn more here https://github.com/pytorch/serve/tree/master/examples/pt2
Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 1.13 + Cuda 11.7
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning almost 2 years ago
This is the release of TorchServe v0.6.1.
install_from_src.py
https://github.com/pytorch/serve/pull/1856 @msaroufim[examples/intel_extension_for_pytorch/README.md
https://github.com/pytorch/serve/pull/1816 @min-jean-choci/benchmark/buildspec.yml
https://github.com/pytorch/serve/pull/1658 @lxningdocker/Dockerfile.neuron.dev
https://github.com/pytorch/serve/pull/1775 in favor of AWS SageMaker DLC. @rohithkrnLICENSE.txt
https://github.com/pytorch/serve/pull/1801 @msaroufimUbuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.
Torch 1.11+ Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning over 2 years ago
This is the release of TorchServe v0.6.0.
buildspec.yaml
- Added fixing for gpu regression test buildspec.yaml
.benchmark/automated
directory in favor of new Github Action based workflowbenchmark-ab.py
report.torch < 1.8.1
- Added exception to notify torch < 1.8.1
.install_dependencies.py
- Added sys.executable in install_dependencies.py
.model_zoo.md
- Added dog breed, mmf and BERT in model zoo.nvgpu
in common requirements - Added nvgpu in common dependencies.Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above.
Torch 1.11+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning over 2 years ago
This is the release of TorchServe v0.5.3.
pip install torchserve-nightly
Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above.
Torch 1.10+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning almost 3 years ago
This is a hotfix release of Log4j issue.
Published by lxning almost 3 years ago
This is a hotfix release of Log4j issue.
Published by lxning almost 3 years ago
This is the release of TorchServe v0.5.0.
TS_CONFIG_FILE
as env var.Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)
Torch 1.10+ Cuda 10.2, 11.3
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2
Published by lxning about 3 years ago
This is a hotfix release of TorchServe v0.4.2.
Published by lxning about 3 years ago
This is the release of TorchServe v0.4.1.
Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)
Torch 1.9.0 + Cuda 10.2, 11.1
Torch 1.8.1 + Cuda 9.2, 10.1
Published by lxning over 3 years ago
This is the release of TorchServe v0.4.0.
Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)
Cuda 10.1, 10.2, 11.1
Published by dhanainme over 3 years ago
Patch release. Fixes Model Archiver to Recursively copy all artifacts
Published by maaquib almost 4 years ago
This is the release of TorchServe v0.3.0
--archive-format no-archive
file:///
) URLsv0.2.0
Ubuntu 16.04, Ubuntu 18.04, MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4)
Additionally, you can get started at https://pytorch.org/serve/ with installation instructions, tutorials and docs.
Lastly, if you have questions, please drop it into the PyTorch discussion forums using the ‘deployment’ tag or file an issue on GitHub with a way to reproduce.