deepsparse

Sparsity-aware deep learning inference runtime for CPUs

OTHER License

Downloads
8.7K
Stars
3K
Committers
43

Bot releases are visible (Hide)

deepsparse - DeepSparse v0.12.1 Patch Release

Published by jeanniefinks over 2 years ago

This is a patch release for 0.12.0 that contains the following changes:

  • Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
  • DeepSparse Server now exposes proper routes for SageMaker.
  • Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.
deepsparse - DeepSparse v0.12.0

Published by jeanniefinks over 2 years ago

New Features:

Documentation:

Changes:

Performance:

  • Speedup for large batch sizes when using sync mode on AMD EPYC processors.
  • AVX2 improvements for
    • Up to 40% speedup out of the box for dense quantized models.
    • Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
  • Speedup from sparsity realized for ConvInteger operators.
  • Model compilation time decreased on systems with many cores.
  • Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
  • Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

Resolved Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
  • Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

  • When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.
deepsparse - DeepSparse v0.11.2 Patch Release

Published by jeanniefinks over 2 years ago

This is a patch release for 0.11.0 that contains the following changes:

  • Fixed an assertion error that would occur when using deepsparse.benchmark on AMD machines with the argument -pin none.

Known Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.
deepsparse - DeepSparse v0.11.1 Patch Release

Published by jeanniefinks over 2 years ago

This is a patch release for 0.11.0 that contains the following changes:

  • When running NanoDet-Plus-m, the DeepSparse Engine will no longer fail with an assertion (See #279).
  • The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new Command-line (CLI) tool multi-process-benchmark.py to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
  • Fixed a performance regression on BERT batch size 1 sequence length 128 models.
deepsparse - DeepSparse v0.11.0

Published by jeanniefinks over 2 years ago

New Features:

  • High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
  • CCX detection added to the DeepSparse Engine for AMD systems.
  • deepsparse.server integration and CLIs added with Hugging Face transformers pipelines support.

Changes:

Performance improvements made for

  • FP32 sparse BERT models
  • batch size 1 networks
  • quantized sparse BERT models
  • Pooling operations

Resolved Issues:

  • When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
  • Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
  • PyTorch downloaded for YOLO model stubs now supported.

Known Issues:

  • When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See #279). A hotfix is being pursued.
deepsparse - DeepSparse v0.10.0

Published by jeanniefinks over 2 years ago

New Features:

  • Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
  • NM_SPOOF_ARCH environment variable added for testing different architectural configurations.
  • Elastic scheduler implemented as an alternative to the single-stream or multi-stream schedulers.
  • deepsparse.benchmark application is now usable from the command-line after installing deepsparse to simplify benchmarking.
  • deepsparse.server CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:

  • More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
  • Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
  • Sparse quantized network performance improved on machines that do not support VNNI instructions.
  • Performance improved for dense BERT with large batch sizes.

Resolved Issues:

  • Possible crashes eliminated for:
    • Pooling operations with small image sizes
    • Rarely, networks containing convolution or GEMM operations
    • Some models with many residual connections

Known Issues:

  • None
deepsparse - DeepSparse v0.9.1 Patch Release

Published by jeanniefinks almost 3 years ago

This is a patch release for 0.9.0 that contains the following changes:

  1. YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
  2. GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
  3. Broadcasted inputs to elementwise operators no longer fail with an assertion error.
  4. Int64 multiplications no longer fail with an illegal instruction on AVX2.
deepsparse - DeepSparse v0.9.0

Published by jeanniefinks almost 3 years ago

New Features:

  • Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
  • Up-to-date version check implemented for DeepSparse.
  • YOLACT and DeepSparse integration added in examples/dbolya-yolact.

Changes:

  • The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
  • Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
    • The softmax operator can now take advantage of tensor columns.
    • Inference batch sizes that are not divisible by 16 are now supported.
  • Various performance improvements made to:
    • certain networks, such as YOLOv5, on AVX2 systems.
    • dense convolutions on some AVX-512 systems.
  • API docs recompiled.

Resolved Issues:

  • In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

  • YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
  • In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.
deepsparse - DeepSparse v0.8.0

Published by jeanniefinks almost 3 years ago

New Features:

  • Tensor columns have been optimized, improving the performance of some networks.
    • This includes but is not limited to pruned and quantized YOLOv5s and BERT.
    • For networks with subgraphs comprised of low-compute operations.
    • Batch size must be a multiple of 16.
  • Reduce operators have been further optimized in the Engine.
  • C++ API support is available for the DeepSparse Engine.

Changes:

  • Performance improvements made for low-precision (8 and 16-bit) datatypes on AVX2.

Resolved Issues:

  • Rarely, when several data arrangement operators were in a row, e.g., Reshape, Transpose, or Slice, assertion errors occurred.
  • When Pad operators were not followed by convolution or pooling, assertion errors occurred.
  • CPU threads migrated between cores when running benchmarks.

Known Issues:

  • None
deepsparse - DeepSparse v0.7.0

Published by jeanniefinks about 3 years ago

New Features:

  • Operators optimized for Engine support:
    • Where*
    • Cast*
    • IntegerMatMul*
    • QLinearMatMul*
    • Gather (for scalar indices)
      *optimized only for AVX-512 support
  • Flag created to disable any batch size overrides, setting the environment variable "NM_DISABLE_BATCH_OVERRIDE=1".
  • Warnings display when emulating quantized operations on machines without VNNI instructions.
  • Support added for Python 3.9.
  • Support added for ONNX versions 1.8 - 1.10.

Changes:

  • Performance improvements made for sparse quantized transformer models.
  • Documentation updates made for examples/ultralytics-yolo to include YOLOv5.

Resolved Issues:

  • A crash could result with an uninitialized memory read. A check is now in place before trying to access it.
  • Engine output_shape functions corrected on multi-socket systems when the output dimensions are not statically known.

Known Issues:

  • BERT models with quantized embeds currently segfault on AVX2 machines. Workaround is to run on a VNNI-compatible machine.
deepsparse - DeepSparse v0.6.1 Patch Release

Published by jeanniefinks about 3 years ago

This is a patch release for 0.6.0 that contains the following changes:

Users no longer experience crashes

  • when running the ReduceSum operation in the DeepSparse Engine.
  • when running operations on tensors that are 8- or 16-bit integers, or booleans, on AVX2.
deepsparse - DeepSparse v0.6.0

Published by jeanniefinks about 3 years ago

New Features:

Changes:

  • Performance improvements made for:
    - all networks when running on multi-socket machines, especially those with large outputs.
    - batched Softmax and Reduce operators with many threads available.
    - Reshape operators when multiple dimensions are combined into one or one dimension is split into multiple.
    - stacked matrix multiplications by supporting more input layouts.
  • YOLOv3 example integration was generalized to ultralytics-yolo in support of both V3 and V5.

Resolved Issues:

  • Engine now runs on architectures with more than one NUMA node per socket.

Known Issues:

  • None
deepsparse - DeepSparse v0.5.1 Patch Release

Published by jeanniefinks over 3 years ago

This is a patch release for 0.5.0 that contains the following changes:

  • resolution to address an issue that caused a performance regression on YOLOv5 and could have affected the correctness of some models.
deepsparse - DeepSparse v0.5.0

Published by jeanniefinks over 3 years ago

New Features:

  • None

Changes:

  • Performance optimizations implemented for binary elementwise operations, where both inputs come from the same source buffer. One of the inputs may have intermediate unary operations.
  • Performance optimizations implemented for binary elementwise operations where one of the inputs is a constant scalar.
  • Small performance improvement for large batch sizes (> 64) on quantized ResNet.

Resolved Issues:

  • Assertion deepsparse num_sockets removed when too many sockets were requested, causing users to experience a crash.
  • Rare assertion failure fixed when a nonlinearity appeared between an elementwise addition and a convolution or gemm.
  • Broken URLs for classification and detection examples updated in the contained READMEs.

Known Issues:

  • None
deepsparse - DeepSparse v0.4.0

Published by jeanniefinks over 3 years ago

New Features:

  • New operator support implemented for Expand.
  • Slice operator support for positive step sizes. Only slice operations that operate on a single axis are supported. Previously, slice was only supported for constant tensors and step size equal to one.

Changes:

  • Memory usage of compiled models reduced.
  • Memory layout for matrix multiplications in Transformers optimized.
  • Precision for swish and sigmoid operations improved.
  • Runtime performance improved for some networks whose outputs are immediately preceded by transpose operators.
  • Runtime performance of softmax operations improved.
  • Readme redesigned for better clarity on the repository's purpose.

Resolved Issues:

  • Using the multi-stream scheduler, when more threads were selected than the number of cores on the system, it no longer causes a performance hit.
  • Neural Magic dependencies upgrade to intended bug versions instead of minor versions.

Known Issues:

  • None
deepsparse - DeepSparse v0.3.1 Patch Release

Published by jeanniefinks over 3 years ago

This is a patch release for 0.3.0 that contains the following changes:

  • Docs updated for new Discourse and Slack links
  • Check added for supported Python version so DeepSparse does not improperly install on unsupported systems
deepsparse - DeepSparse v0.3.0

Published by jeanniefinks over 3 years ago

New Features:

  • Multi-stream scheduler added as a configurable option to the engine.

Changes:

  • Errors related to setting the NUMA memory policy are now issued as warnings.
  • Improved compilation times for sparse networks.
  • Performance improvements made for: networks with large outputs and multi-socket machines; ResNet-50 v1 quantized and kernel sparsity gemms.
  • Copy operations and placement of quantization operations within network optimized.
  • Version changed to be loaded from version.py file, default build on branches is now nightly.
  • cpu.py file and related APIs added to DeepSparse repo instead of copying over from backend.
  • Add unsupported system install errors for end users when running on non-Linux systems.
  • YOLOv3 batch 64 quantized now has a speedup of 16% in the DeepSparse Engine.

Resolved Issues:

  • An assertion is no longer triggered when more sockets or threads than available are requested.
  • Resolved assertion when performing Concat operations on constant buffers.
  • Engine no longer crashes when the output of a QLinearMatMul operation has a dimension not divisible by 4.
  • The engine now starts without crashing on Windows Subsystem for Linux and Docker for Windows or Docker for Mac.

Known Issues:

  • None
deepsparse - DeepSparse v0.2.0

Published by jeanniefinks over 3 years ago

New Features:

  • None

Changes:

  • Dense convolutions on AVX2 systems were optimized, improving performance for many non-pruned networks. In particular, this results in a speed improvement for batch size 64 ResNet-50 of up to 28% on Intel AVX2 systems and up to 39% on AMD AVX2 systems.
  • Operations to shuffle activations in engine optimized, resulting in up to 14% speed improvement for batch size 64 pruned quantized MobileNetV1.
  • Performance improvements made for networks with large output arrays.

Resolved Issues:

  • Engine no longer fails with an assert when running some quantized networks.
  • Some Resize operators were not optimized if they had a ROI input.
  • Memory leak addressed on multi-socket systems when batch size > 1.
  • Docs and readme corrections made for minor issues and broken links.
  • Makefile no longer deletes files for docs compilation and cleaning.

Known Issues:

  • In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.

  • In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:

    1. Enable thread binding

    If that does not give performance benefit or you want to try additional options:

    1. Use the numactl utility to prevent the process from running on hyperthreads.

    2. Manually set the thread affinity in Python as follows:

    import os
    from deepsparse.cpu import cpu_architecture
    ARCH = cpu_architecture()
    
    if ARCH.vendor == "GenuineIntel":
        os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
    elif ARCH.vendor == "AuthenticAMD":
        os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
    else:
        raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")
    
deepsparse - DeepSparse v0.1.1 Patch Release

Published by jeanniefinks over 3 years ago

This is a patch release for 0.1.0 that contains the following changes:

  • Docs updates: tagline, overview, update to use sparsification for verbiage
  • Examples updated to use new ResNet-50 pruned_quant moderate model from the SparseZoo
  • Nightly build dependencies now match on major.minor and not full version
  • Benchmarking script added for reproducing ResNet-50 numbers
  • Small (3-5%) performance improvement for pruned quantized ResNet-50 models, for batch size greater than 16
  • Reduced memory footprint for networks with sparse fully connected layers
  • Improved performance on multi-socket systems when batch size is larger than 1
deepsparse - DeepSparse v0.1.0 First GitHub Release

Published by jeanniefinks over 3 years ago

Welcome to our initial release on GitHub! Older release notes can be found here.

New Features:

  • Operator support enabled:
    • QLinearAdd
    • 2D QLinearMatMul when the second operand is constant
  • Multi-stream support added for concurrent requests.
  • Examples for benchmarking, classification flows, detection flows, and Flask servers added.
  • Jupyter Notebooks for classification and detection flows added.
  • MakeFile flows and utilities implemented for GitHub repo structure.

Changes:

  • Software packaging updated to reflect new GitHub distribution channel, from file naming conventions to license enforcement removal.
  • Initial startup message updated with improved language.
  • Distribution now manylinux2014 compliant; support for Ubuntu 16.04 deprecated.
  • QuantizeLinear operations now use division instead of scaling by reciprocal for small quantization scales.
  • Small performance improvements made on some quantized networks with nontrivial activation zero points.

Resolved Issues:

  • Networks with sparse quantized convolutions and nontrivial activation zero points now have consistent correct results.
  • Crash no longer occurs for some models where a quantized depthwise convolution follows a non-depthwise quantized convolution.

Known Issues:

  • None
Package Rankings
Top 3.53% on Pypi.org
Top 6.75% on Proxy.golang.org
Related Projects