Sparsity-aware deep learning inference runtime for CPUs
OTHER License
Bot releases are hidden (Show)
This is a patch release for 1.7.0 that contains the following changes:
sentencepiece
-based tokenizers. (#1635)Published by jeanniefinks 7 months ago
deepsparse.evaluate
APIs and CLIs added with plugins for perplexity and lm-eval-harness for LLM evaluations. (#1596)sequence_length
for greater control over text generation pipelines. (#1518)deepsparse.analyze
functionality has been updated to work properly with LLMs. (#1324)kv_cache
input while using external KV cache management, which resulted in inaccurate model inference for ONNX Runtime comparison pathways. (#1337)scipy
and crash. (#1604, #1602)Published by jeanniefinks 10 months ago
This is a patch release for 1.6.0 that contains the following changes:
LICENSE-NEURALMAGIC
to LICENSE
for higher visibility in the DeepSparse GitHub repository and the C++ engine package tarball, deepsparse_api_demo.tar.gz. (#1485)Published by jeanniefinks 10 months ago
Version support added:
Decoder-only text generation LLMs are optimized in DeepSparse and offer state-of-the-art performance with sparsity!pip install deepsparse[llm]
and then use the TextGeneration Pipeline. For performance details, check out our Sparse Fine-Tuning paper.
(#1022, #1035, #1061, #1081, #1132, #1122, #1137, #1121, #1139, #1126, #1151, #1140, #1173, #1166, #1176, #1172, #1190, #1142, #1205, #1204, #1212, #1214, #1194, #1218, #1196, #1217, #1216, #1225, #1240, #1254, #1246, #1250, #1266, #1270, #1276, #1274, #1235, #1284, #1285, #1304, #1308, #1310, #1313, #1272)
OpenAI-compatible DeepSparse Server has been added, enabling standard OpenAI requests for performant LLMs. (#1171, #1221, #1228, #1317)
MLServer-compatible pathways for DeepSparse Server to enable standard MLServer requests. (#1237)
CLIP model support for deployments and performance functionality is now enabled. (Documentation) (#1098, #1145, #1203)
Several encoder-decoder networks have been optimized for performance: Donut, Whisper, and T5.
Support for ARM processors is now generally available. ARMv8.2 or above is required for quantized performance. (#1307)
Support for macOS is now in Beta. macOS Ventura (version 13) or above and Apple silicon are required. (#1088, #1096, #1290, #1307)
DeepSparse Server updated to support generic pipeline Python implementations for easy extensibility. (#1033)
YOLOv8 deployment pipelines and model support have been added. (#1044, #1052, #1040, #1138, #1261)
AWS and GCP marketplace documentation added: AWS | GCP (#1056, #1057)
DigitalOcean marketplace integration added. (Documentation) (#1109)
DeepSparse Azure marketplace integration added. (Documentation) (#1066)
DeepSparse Pipeline timing added. To access, utilize pipeline.timer_manager
or utilize deepsparse.benchmark_pipeline
CLI. (#1062, #1150, #1268, #1259, #1294)
TorchScriptEngine
class added to enable benchmarking and evaluation comparisons to DeepSparse. (#1015)
debug_analysis API now supports exporting CSVs, enabling easier analysis. (#1253)
SentenceTransformers deployment and performance support have been added. (#1301)
DeepSparse upgraded for the SparseZoo V2 model file structure changes, which expands the number of supported files and reduces the number of bytes that need to be downloaded for model checkpoints, folders, and files. (#1233, #1234, #1303, #1318)
YOLOv5 deployment pipelines migrated to install from nm-yolov5
on PyPI and remove the autoinstall from the nm-yolov5
GitHub repository that would happen on invocation of the relevant pathways, enabling more predictable environments. (#1030, #1101, #1129, #1111, #1167)
Docker builds are updated to consistently rebuild for new releases and nightlies. ( #1012, #1068, #1069, #1113, #1144)
Torchvision deployment pipelines have been upgraded to support 0.14.x. (#1034)
README and documentation updated to include: Slack Community name change, Contact Us form introduction, Python version changes; corrections for YOLOv5 torchvision, transformers, and SparseZoo broken links; and installation command. (#1041, #1042, #1043, #1039, #1048, #931, #960, #1279, #1282, #1280, #1313)
Python 3.7 is now deprecated. (#1060, #1148)
ONNX utilities are updated so that ONNX model arguments can be passed as either a model file path (past behavior) or an ONNX ModelProto Python object. (#1089)
Deployment directories containing a model.onnx
will now load properly for all pipelines supported by DeepSparse Server. Before, specific paths needed to be supplied to the exact model.onnx
file rather than a deployment directory. (#1131)
Flake8 updated to 6.1 to enable the latest standards for running make quality. (#1156)
Automatic link checking has been added to GitHub actions. (#1226)
DeepSparse Pipeline has been changed to make it printable, such that __str__ and __repr__
is implemented and will show useful information when a pipeline is printed. (#1298)
nm-transformers
package has been fully removed and replaced with the native transformers package that works with DeepSparse. (#1302)
deepsparse.benchmark
was failing with AttributeError
when the -shapes
argument was supplied, causing no benchmarks to be measured. (#1071)model.onnx
file in the model directory was causing the server to raise an exception for image classification pipelines. (#1070)Generate_random_inputs
function no longer creates random data with shapes 0 when ONNX files containing dynamic dimensions were given. (#1086)num_cores
was not supplied as an explicit kwarg for a bucketing pipeline, it would trigger a key error. This is now updated to ensure the pipeline works correctly without num_cores
being explicitly supplied as an kwarg. (#1152)eval_downstream
for Transformers pathways no longer fails due to a PyTorch requirement not being installed. The fix now removes the PyTorch support dependency, and it runs correctly through. (#1187)test_pipeline_call_is_async
has been improved to produce consistent test results. (#1251, #1264, #1267)Published by jeanniefinks about 1 year ago
This is a patch release for 1.5.0 that contains the following changes:
Published by jeanniefinks over 1 year ago
This is a patch release for 1.5.0 that contains the following changes:
Published by jeanniefinks over 1 year ago
This is a patch release for 1.5.0 that contains the following changes:
Published by jeanniefinks over 1 year ago
deepsparse.benchmark_sweep
CLI to enable sweeps of benchmarks across different settings such as cores and batch sizes (#860)Engine.generate_random_inputs()
API (#966)pip install deepsparse[transformers]
and pip install deepsparse[yolov5]
will need to be used.num_streams
parameter that is smaller than the number of cores, multi-stream and elastic scheduler behaviors have been improved. Previously, DeepSparse would divide the system into num_streams
chunks and fill each chunk until it ran out of threads. Now, each stream will use a number of threads equal to num_cores
divided by num_streams
, with the remainder distributed in a round-robin fashion.In networks with a Clip operator where min isn't equal to zero, performance bugs no longer occurs.
Crashing eliminated:
ignore_labels
. (#903)Assertion errors/failures removed:
Published by jeanniefinks over 1 year ago
This is a patch release for 1.4.0 that contains the following changes:
Published by jeanniefinks over 1 year ago
This is a patch release for 1.4.0 that contains the following changes:
Published by jeanniefinks over 1 year ago
Published by jeanniefinks over 1 year ago
This is a patch release for 1.3.0 that contains the following changes:
Published by jeanniefinks almost 2 years ago
This is a patch release for 1.3.0 that contains the following changes:
Published by jeanniefinks almost 2 years ago
default_precision
parameter in the configuration file.engine
class.warn
.axes
parameter to be specified either as an input or an attribute in several ONNX operators.Published by jeanniefinks almost 2 years ago
time.perf_counter
for more accurate benchmarks.num_streams
provided to the engine_context_t
is greater than the number of physical CPU cores.Published by jeanniefinks about 2 years ago
num_streams
provided to the engine_context_t
is greater than the number of physical CPU cores.Published by jeanniefinks over 2 years ago
This is a patch release for 1.0.0 that contains the following changes:
Published by jeanniefinks over 2 years ago
This is a patch release for 1.0.0 that contains the following changes:
Crashes with an assertion failure no longer happen in the following cases:
num_streams
parameter to fewer than the number of NUMA nodes.The engine no longer enters an infinite loop when an operation has multiple inputs coming from the same source.
Error messaging improved for installation failures of non-supported operating systems.
Supported transformers datasets
version capped for compatibility with pipelines.
Published by jeanniefinks over 2 years ago
num_streams
argument to tune the number of requests that are processed in parallel.num_streams
parameter to fewer than the number of NUMA nodes; hotfix forthcoming.Published by jeanniefinks over 2 years ago
This is a patch release for 0.12.0 that contains the following changes: