Sparsity-aware deep learning inference runtime for CPUs
OTHER License
Bot releases are visible (Hide)
Published by jeanniefinks over 2 years ago
This is a patch release for 0.12.0 that contains the following changes:
Published by jeanniefinks over 2 years ago
Documentation:
deepsparse.server
capabilities, including single model and multi-model inferencing.Performance:
Documentation:
deepsparse.server
, deepsparse.benchmark
, and Transformer pipelines.deepsparse.benchmark
CLI command.arch.bin
now receive a correct architecture profile of their system.NM_SERIAL_UNIT_GENERATION=1
.Published by jeanniefinks over 2 years ago
This is a patch release for 0.11.0 that contains the following changes:
deepsparse.benchmark
on AMD machines with the argument -pin none
.Published by jeanniefinks over 2 years ago
This is a patch release for 0.11.0 that contains the following changes:
multi-process-benchmark.py
to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.Published by jeanniefinks over 2 years ago
deepsparse.server
integration and CLIs added with Hugging Face transformers pipelines support.Performance improvements made for
Published by jeanniefinks over 2 years ago
NM_SPOOF_ARCH
environment variable added for testing different architectural configurations.deepsparse.benchmark
application is now usable from the command-line after installing deepsparse to simplify benchmarking. deepsparse.server
CLI and API added with transformers support to make serving models like BERT with pipelines easy.Published by jeanniefinks almost 3 years ago
This is a patch release for 0.9.0 that contains the following changes:
Published by jeanniefinks almost 3 years ago
Published by jeanniefinks almost 3 years ago
Published by jeanniefinks about 3 years ago
Published by jeanniefinks about 3 years ago
This is a patch release for 0.6.0 that contains the following changes:
Users no longer experience crashes
Published by jeanniefinks about 3 years ago
Published by jeanniefinks over 3 years ago
This is a patch release for 0.5.0 that contains the following changes:
Published by jeanniefinks over 3 years ago
deepsparse num_sockets
removed when too many sockets were requested, causing users to experience a crash.Published by jeanniefinks over 3 years ago
Published by jeanniefinks over 3 years ago
This is a patch release for 0.3.0 that contains the following changes:
Published by jeanniefinks over 3 years ago
Published by jeanniefinks over 3 years ago
In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.
In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:
If that does not give performance benefit or you want to try additional options:
Use the numactl utility to prevent the process from running on hyperthreads.
Manually set the thread affinity in Python as follows:
import os
from deepsparse.cpu import cpu_architecture
ARCH = cpu_architecture()
if ARCH.vendor == "GenuineIntel":
os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
elif ARCH.vendor == "AuthenticAMD":
os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
else:
raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")
Published by jeanniefinks over 3 years ago
This is a patch release for 0.1.0 that contains the following changes:
Published by jeanniefinks over 3 years ago
Welcome to our initial release on GitHub! Older release notes can be found here.