armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn

MIT License

Stars
1.2K

Bot releases are visible (Hide)

armnn - Release 24.08 Latest Release

Published by orlmon01 about 2 months ago

Summary

New Features

  • Softmax implemented in TosaCommon and TosaRef.
  • MEAN implemented in TosaCommon and TosaRef.
  • REDUCE_SUM implemented in TosaCommon and TosaRef.
  • Activation:Gelu implemented in TosaRef.
  • ElementwiseUnary:Log implemented in TosaRef.
  • Pad implemented in TosaCommon and TosaRef.
  • ElementwiseUnary:Exp implemented in TosaRef.
  • BatchMatMul implemented in TosaCommon and TosaRef.
  • FullyConnected implemented in TosaCommon and TosaRef.
  • Activation:BoundedReLu implemented in TosaCommon and TosaRef.
  • Activation:ReLu implemented in TosaCommon and TosaRef.
  • DepthwiseConvolution2d Implemented in TosaCommon and TosaRef.
  • Implemented quantized ElementwiseBinary Add, Max, Mul and Sub support in TosaCommon and TosaRef.

Bug Fixes

  • Fix floating point exception in PerAxisIterator.
  • Fix TFLite Parser & Opaque Delegate ExecuteNetwork incorrectly unloading runtime.
  • Fix StridedSliceOp out of bounds errors.
  • Fix not specified dimensionality errors in classic and opaque delegates.
  • Fix warnings when building ArmNN Delegate with GCC-14.1.0.
  • Fix ReshapeOp DTS Test Failures.
  • Fix ConstFloat DTS Test Failures.
  • Fix Broadcast DTS test failures.
  • Fix BatchMatMul DTS test failures.

Other Changes

  • Update to Arm NN documentation for 24.08 release.
  • Review and update documentation for the 24.08 release.
  • Android support for evaluate_network.sh.
  • Added Gemmlowp for fixed point arithmetic on small values.
  • Moved Arm NN repository to use CMake 3.22.
  • Added Numpy Support to Execute Network.

ABI/API Changes

No ABI breaking change occurred in ArmNN Core (libarmnn.so) and so the Major version has not changed, only a bump in minor version (33.1.0 → 33.2.0).

No API breaking back-end changes have occurred during the implementation of 24.08.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.22.1
Tensorflow 2.15.0
Onnx 1.6.0
Flatbuffer 23.5.26
Protobuf 3.12.0
Android NDK r26b
mapbox/variant 1.2.0
cxxopts 3.1.1
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
mapbox/variant 1.1.0
stb 2.16
Gemmlowp 16e8662c34917be0065110bfcd9cc27d30f52fdf
armnn - Release 24.05

Published by KevinARM 5 months ago

Summary

New Features

  • ScatterNd Operator Implementation.
    • Added support to delegate and opaque delegate.
    • Added support to Serializer and Deserializer.
    • Added support to TFLite parser.
    • End to End tests added.
    • Added support for CpuRef and GpuAcc.
  • Adding options to serialize networks in ExecuteNetwork.
  • Add a build option to enable the OpenMP scheduler in ACL and made it the default scheduler for ACL builds.
  • Add Boolean data type to Debug layer support.
  • Update TOSA Common and TosaRef to use TOSA v0.80.
  • Update build-tool README to include macOS support.

Bug Fixes

  • ExecuteNetwork fix for abort after inference.
  • Fix for failing CTS Float16 tests.
  • Enable serialize-to-armnn only when ARMNN_SERIALIZER is on.
  • TosaCommon backend
    • In TosaCommon, modify the way the unique names for the inputs are generated.
    • CreateRescaleTosaOperator() modified.
    • Move ComputeSplitAxis() to backendsCommon/WorkloadUtils.
    • For LeakyRelu, add TosaRefEndToEndTests and enable FP16 in TOSA mapping.
    • Fix quantized Conv2d TOSA mapping.
  • Broadcast handling for Comparison layer is inconsistent.
  • Remove limitations on zero scale value in quantization.
  • Fix failing fsrcnn test.
  • Fix broken link in the delegate README.
  • Fix runtime memory handling in delegate and Arm NN executor.
  • Remove use of std::clamp.
  • Syntax change to allow building on older compilers.
  • Assert audit and removal.

Other Changes

  • Deprecation notices for items to be removed in 24.08 release.
  • Review and update documentation for operators added in 24.05 release.
  • Update to Arm NN documentation for 24.05 release.
  • Update python pillow version.
  • Remove reference to 22.08 release in docker README.
  • Minor change to the printouts in ExecuteNetwork.
  • Enable build of execute network in build tool.
  • Arm NN build tool script update for delegate header and so files.

ABI/API Changes

No API breaking front-end changes have occurred during the implementation of 24.05.

No API breaking back-end changes have occurred during the implementation of 24.05.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.19.0 (Ubuntu) and 3.19.0 (Debian)
Tensorflow 2.15.0
Onnx 1.6.0
Flatbuffer 23.5.26
Protobuf 3.12.0
Android NDK r26b
mapbox/variant 1.2.0
cxxopts 3.1.1
doctest 2.4.6
fmt 8.3.0
ghc 1.3.2
half 1.12.0
mapbox/variant 1.2.0
stb 2.16
xxd 1.10
armnn - Release 24.02

Published by KevinARM 8 months ago

Summary

New Features

  • ArmNN to TOSA backend:
    • LeakyRelu Activation support added
    • Quantize support added
    • Maximum support added
    • Split support added
    • Resize Nearest Neighbour support added
  • GpuFsa Backend (Dynamic Fusion)
    • RESIZE/SCALE support added
    • CAST support added
    • POOL2d support added
    • SUB support added
    • ADD support added
    • DEPTHWISE CONVOLUTION 2D support added
    • CONVOLUTION 2D support added
  • Updated to Android NDK r26b
  • Updated to TensorFlow 2.15
  • Added optimization to remove reshape operators where possible to CL, Neon and Ref backends.

Bug Fixes

  • Removed implicit sign conversion which could cause compile errors
  • Fixed memory leak which only happens during profiling and reference Resize workload's align corners is true
  • Fixed build failures on C++ 14 compilers
  • Fixed build tool errors when building for Android target

Other Changes

  • Delegate Unit Tests are now only built for the backends which are being built
  • Increased end to end testing for two layer and three layer MaxPool2d
  • In ExecuteNetwork added support to serialize to dot graph for the Arm NN Delegates

ABI/API Changes

No API breaking front-end changes have occurred during the implementation of 24.02

No API breaking back-end changes have occurred during the implementation of 24.02

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.19.0 (Ubuntu) and 3.19.0 (Debian)
ACL branches/arm_compute_24_02
android-nn-driver branches/android-nn-driver_24_02
Tensorflow 2.15.0
Onnx 1.6.0
Flatbuffer 23.5.26
Protobuf 3.12.0
Android NDK r26b
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
mapbox/variant 1.2.0
stb 2.16
xxd 1.10
armnn - Release 23.11

Published by nikraj01 11 months ago

Summary

New Features

  • Add support for BROADCAST_TO layer in CpuRef, and remove it when it is followed by ElementWise layer.
  • Add an optimization that fuses Add+Mul+Add+(Optional Relu) layers in CpuAcc.
  • Add support for GELU activation layer in CpuRef, CpuAcc, GpuAcc.
  • Upgrade Arm NN to Tensorflow 2.14
  • Add Signed64 support
  • Add support for Signed64 data type in Cast layer
  • Add a script that evaluates the performance of a network
  • Add ReverseV2 CL and Neon Workloads

TfLite Parser

  • Add support for BROADCAST_TO layer.
  • Add support for GELU activation layer.
  • Updating TfLite parser to ignore VALIDATION: subgraphs

Arm NN Serializer/Deserializer:

  • Add support for GELU activation layer.

Bug Fixes

  • Fix UnidirectionalSequenceLstm
  • Fix weights checking when converting in Support Library
  • Fix unsafe Usages of Memcpy in Armnn
  • Fix for -Wno-sign-conversion in profiling test in gcc9
  • Fix ElementwiseBinary missing from NeonBackend activation fusion optimization
  • Fix Reshape and concat invalid results
  • Remove unnecessary Prelu restriction in quantization
  • Remove unnecessary Square Difference restriction in quantization

Other Changes

  • Update the Arm NN Execute Network app --help
  • Introduce clang-format scripts to ArmNN
  • Remove profiling detail for ConstTensorAsInputs Layers
  • Install missing profiling headers
  • Remove ASSERTs from deserializer code
  • Remove ASSERTs from armnnUtils code
  • Remove ASSERTs from shim code
  • Update documentation to correct C++ version: C++ 17
  • Removing explicit block on non constant bias in NEON CONV2D, allowing Arm Compute Library to handle this.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 23.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 33.1.0 and our OPAQUE_DELEGATE_VERSION to 2.0.0, following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Add ArmNNSettings to Opaque Delegate 3e4b60897bde2ad7ab5b730c7c5d727e41cc0eef https://review.mlplatform.org/c/ml/armnn/+/10493 2 changes have occurred: TfLiteArmnnOpaqueDelegateCreate function has a different signature: Previously: TfLiteOpaqueDelegate* TfLiteArmnnOpaqueDelegateCreate(const void* settings); Now: TfLiteOpaqueDelegate* TfLiteArmnnOpaqueDelegateCreate(armnnDelegate::DelegateOptions options); Size of struct ArmnnDelegatePlugin has increased as a new private member has been added: armnnDelegate::DelegateOptions m_delegateOptions;

No API breaking back-end changes have occurred during the implementation of 23.11

TfLite Delegate

  • Add support for BROADCAST_TO layer to Classic and opaque delegate.
  • Add support for GELU activation layer to classic and opaque delegate.
  • Add ArmNNSettings parser function for Opaque Delegate.
  • Improve logging in the delegate

Bug Fixes

  • Reduce Sum uint8 failing. The fix was to only treat Reduce Prod Uint8 as a special case, as opposite of treating all reduce operations in uint8 as special case (kTfLiteAffineQuantization → kTfLiteNoQuantization)
  • Fix Issue with delegate supporting FP16 models
  • Delegate Test Suite: Fix reshape floating point exception
  • Delegate Test Suite: Fix default scale/offset issue
  • Delegate Test Suite: Fix ElementWise isnan assert
  • Delegate Test Suite: Fix Unspecified dimension while using ShapeInferenceMethod::ValidateOnly
  • Delegate Test Suite: Fix QuantizePerChannel tests
  • Delegate Test Suite: Fix Gather and GatherNd Tests in CpuRef

PyArmNN

  • Update requests version in PyArm NN
  • Bump Pillow version from 9.3.0 to 10.0.1

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.19.0 (Ubuntu) and 3.19.0 (Debian)
Tensorflow 2.14.0
Onnx 1.6.0
Flatbuffer 23.5.26
Protobuf 3.12.0
Android NDK r25
mapbox/variant 1.2.0
cxxopts 3.1.1
doctest 2.4.6
fmt 8.3.0
ghc 1.3.2
half 1.12.0
mapbox/variant 1.2.0
stb 2.16
xxd 1.10
armnn - Release 23.08

Published by nikraj01 about 1 year ago

Summary

New Features

  • Added support for tile operator in CpuRef, CpuAcc, GpuAcc.
  • Added support for reverse_v2 operator in CpuRef.
  • Added pow and squared_difference as ElementWiseBinary layers in CpuRef, CpuAcc, and GpuAcc.
  • Added squared_difference, power and ceil to TypeUtils.hpp.
  • Enabled dynamic / non-constant bias for:
    • Fully-Connected layers in CpuAcc and GpuAcc
    • 3-D Convolutional layers in CpuAcc and GpuAcc
    • Depthwise Convolutional layers in GpuAcc
  • Added DataType to .dot files for constant layers.
  • Added BinaryElementwiseOperation to .dot files.
  • Added a FileComparisonExecutor to ExecuteNetwork.
  • Added an optional TensorInfo to InputSlot.
  • Added 3D tensors to batch_to_space and space_to_batch for CpuAcc and GpuAcc.
  • Added check for half-precision floating-point infinity values and backend support (FP16).
  • Added backend optimisations to remove reshape layers where possible.
  • Added data layout to tensors in NeonStridedSliceWorkload.
  • Added names to workloads.
  • Enabled slice end-to-end tests in all backends and Signed32 in CpuRef.
  • Added axis to ViewsDescriptor.
  • Refactored ElementBinaryOps to use ElementBinaryLayer.

TfLite Parser

  • Added reverse_v2 support to TFLite Parser.
  • Added tile to TFLite Parser.
  • Added square as mul in the TFLite Parser.
  • Check for options != null before adding fused activation in TFLite Parser.
  • Fixed segfault with some models in the TFLite Parser.

Arm NN Serializer/Deserializer:

  • Added tile to Serialiser/Deserialiser.
  • Added reverse_v2 to Serialiser/Deserialiser.

Support library

  • Added reverse_v2 to Support Library.
  • Added tile to Support Library.
  • Added cache-size check to Support Library.

Bug Fixes

  • Fixed incorrect validation of unidirectional_sequence_lstm on CL and Neon.
  • Fixed issue with ExecuteNetwork when running with TFLite Executor.
  • Replaced asserts with exceptions in Gather reference workload.
  • Introduced fix to explicitly state the correct header to be included (following prior deprecation warning).
  • Fixed XML parsing error in Arm NN Doxygen.
  • Fixed -Werror=unused-result error.
  • Introduced fix for ExecuteNetwork where --output-network-details-only was not working with -T delegate flag.
  • Introduced fix for duplicate definitions in cross-compilation build.
  • Fixed incorrect Concat permutation parameters in Support Library.
  • Removed unnecessary warnings for certain models.
  • Introduced fix to allow SplitterLayer to use overridden TensorInfos correctly.
  • Introduced fix for some cases where the use of sub-tensors was causing an error.
  • Fixed read memory access caused by missing printf arguments.
  • Introduced fix for failing dynamic backend build.
  • Fixed issue where the dimension's specificity didn't match the number of dimensions.
  • Fixed ambiguous method name in BackendHelper.
  • Introduced fix for segmentation fault when an input was directly connected to an output.
  • Fixed uninitialised variable error found during static analysis.
  • Fixed fault in ExecuteNetwork when a model file was passed without an extension.
  • Fixed GitHub issue where search bar was not working in Doxygen documentation.

Other Changes

  • Replaced use of std::filesystem with ghc::filesystem.
  • Refactored ConnectedToSplitterWithMoreThan4Dims function to a more generally useful ConnectedToLayerType function.
  • Customised Doxygen output.
  • Removal of deprecated code due to be removed in 23.08 or earlier:
    • INetworkProperties
    • SubgraphView
    • ILayerSupport
    • WorkloadFactory
  • Updated documentation with new operators in 23.08.
  • Audited the use of armnn_assert.

Known Issues

  • Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
  • There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 23.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 33.0.0, following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Removal of Reshape 4cc341cf8b5a6e6bb0543504cbbfde6fa11a2cdb https://review.mlplatform.org/c/ml/armnn/+/9885 4 additional virtual methods added to class IInputSlot: SetTensorInfo ( TensorInfo ), GetTensorInfo ( ) const, IsTensorInfoSet ( ) const, IsTensorInfoOverridden ( ) const
Front end and reference implementation for TILE 79a06a59bafadf736ca53c4240e87f9bbb657260 https://review.mlplatform.org/c/ml/armnn/+/9920 LayerType enum has had the LastLayer member value changed from 72 to 74 The member Tile with value 74 has been added
Remove deprecated code 09e4d05b85cc5ed419d282cdfc0b153f83c3fa39 https://review.mlplatform.org/c/ml/armnn/+/9266 2 functions have been removed from the BatchMatMulDescriptor class: BatchMatMulDescriptor::GetAxesNotMul ( struct BatchMatMulDescriptor const& desc, TensorShape const& inputXShape, TensorShape const& inputYShape ) [static] BatchMatMulDescriptor::GetAxesToMul ( struct BatchMatMulDescriptor const& desc, TensorShape const& tensorXShape, TensorShape const& tensorYShape ) [static]
Remove deprecated code (INetworkProperties) b179382bcb4944d0137aa9799c3c56a2102ecda2 https://review.mlplatform.org/c/ml/armnn/+/10001 INetworkProperties structure has had the following fields removed: m_ExportEnabled m_ImportEnabled
Remove deprecated code (ILayerSupport) 66277031d8fb9588b5a9f3436b6a5f06173668a8 https://review.mlplatform.org/c/ml/armnn/+/10005 In ArmNN individual virtual IsXXXSupported() functions in the ILayerSupport class have been removed. This functionality has been replaced by a more ABI compliant model whereby an IsLayerSupported() function now accepts a LayerType argument. In ArmNNTestUtils, removal of 4 virtual methods from class MockLayerSupport: IsAdditionSupported ( TensorInfo const&, TensorInfo const&, TensorInfo const&, Optional<std::__cxx11::basic_string&> ) const IsConvolution2dSupported ( TensorInfo const&, TensorInfo const&, struct Convolution2dDescriptor const&, TensorInfo const&, Optionalconst&, Optional<std::__cxx11::basic_string&> ) const IsInputSupported ( TensorInfo const&, Optional<std::__cxx11::basic_string&> ) const IsOutputSupported ( TensorInfo const&, Optional<std::__cxx11::basic_string&> ) const
Remove deprecated code (WorkloadFactory) 7894ef93ad568250afda12e1b67bc5bfa4c0b41c https://review.mlplatform.org/c/ml/armnn/+/10006 In ArmNNTestUtils the MockWorkloadFactory class has had the following virtual method removed: CreateInput ( InputQueueDescriptor const&, struct WorkloadInfo const& ) const
Added Axis to ViewsDescriptor fca5916e4e6a44cf11b47328659d4d7ee95ec231 https://review.mlplatform.org/c/ml/armnn/+/10073 The size of thte ViewsDescriptor structure has changed from 48 bytes to 56 bytes Field m_IsAxisSet has been added

The following back-end API changes have occurred during the implementation of 23.08 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Add names to workloads 7cbe78140a274cec783049051df7c7298b974f13 https://review.mlplatform.org/c/ml/armnn/+/9983 Pure virtual method GetName ( ) const has been added to IWorkload class. Size of CopyMemGenericWorkload class has been changed from 152 bytes to 184 bytes.
Remove deprecated code (SubgraphView) 0f3e9a09a90664fc7c6479f1d7b312a4671d9659 https://review.mlplatform.org/c/ml/armnn/+/10009 Removed the following methods from SubgraphView: GetInputSlot, GetIputSlots, GetLayers, GetOutputSlot, GetOutputtSlots. Resulted in a change to return types for the following methods in SubgraphView as follows: begin() returns SubgraphView::IConnectableLayerIterator, begin() const returns SubgraphView::ConstIConnectableIterator, cbegin() const returns, SubgraphView::ConstIConnectableIterator, cend() const returns SubgraphView::ConstIConnectableIterator , end () returns SubgraphView::IConnectableLayerIterator, end() const returns SubgraphView::ConstIConnectableIterator
Remove deprecated code (ILayerSupport) a5048344b5dc95cf305d7ffdc7390ef6df109f4c https://review.mlplatform.org/c/ml/armnn/+/10071 Made IWorkloadFactory::CreateWorkload a pure virtual function to force client to write its own implementation.
Fix coverity error on variable initialize b9b97922723772014236abb5d0d21c2f07adc578 https://review.mlplatform.org/c/ml/armnn/+/10075 Adjusted sequence of variable initialization in struct WorkloadInfo: m_Name m_WeightsTensorInfo m_BiasTensorInfo m_ConvolutionMethod

TfLite Delegate

  • Extended support for 3D tensors (batch_to_space and space_to_batch) in CpuRef.
  • Added opaque delegate Options subsection to Doxygen.
  • Added layerNames to classic and opaque delegate.
  • Added reverse_v2 to classic and opaque delegates.
  • Added tile to delegate and opaque delegate.
  • Added leaky_relu to delegate.

Bug Fixes

  • Fixed versions in delegate "Quick Start" guide.
  • Opaque delegate cleanup.
  • Fixed failure on unidirectional_sequence_lstm operator.
  • Fixed instance where ExpandDims would not work where batch != 1.

PyArmNN

  • Updated PyArm NN to include new features added in Arm NN.
  • Added relevant deprecation message when building PyArm NN.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.19.0 (Ubuntu) and 3.19.0 (Debian)
Tensorflow 2.12.0
Onnx 1.6.0
Flatbuffer 2.0.6
Protobuf 3.12.0
Android NDK r25
mapbox/variant 1.2.0
cxxopts 3.1.1
doctest 2.4.6
fmt 8.3.0
ghc 1.3.2
half 1.12.0
mapbox/variant 1.2.0
stb 2.16
xxd 1.10
armnn - Release 23.05

Published by nikraj01 over 1 year ago

Summary

New Features

  • Added support for dynamic weights in CpuAcc and GpuAcc for FullyConnected workloads.
  • Added support for Crops in CpuAcc and GpuAcc BatchToSpaceND workloads.
  • Added support for int8 and changed the Compute Library kernel used for Fp32 in CpuAcc and GpuAcc Batch MatMul workloads.
  • Added Opaque TfLite Delegate which provides the same operator coverage as the existing/classic TfLite Delegate. More information can be found in the TfLite Delegate section below

TfLite Parser

  • Added support for CEIL and SPACE_TO_DEPTH operators.
  • Fixed bug where calculated output shape wasn't being recorded in ParseSqueeze.
  • Fixed segfault in ParseTransposeConv2d when output_shape is not constant.
  • Fixed bug where negative axis was being read incorrectly in ParseMean.
  • Calculate explicit padding for Transpose Convolution using output shape, if specified.

ONNX Parser

  • Added support for dimensions > 2 to MatMul/FullyConnected.

Arm NN Serializer/Deserializer:

  • Added support for CEIL.

Bug Fixes

  • Fixed compare-output output feature in ExecuteNetwork.
  • Fixed gcc 13 compiler errors.

Other Changes

  • Added ElementwiseBinaryLayer to replace Add, Div, Sub, Maximum, Mul and Minimum layers.
  • Updated build Android NDK guide (BuildGuideAndroidNDK.md).
  • Set default quantization parameter scale to 1.0, instead of 0.0.
  • Fp16ToFp32 and Fp32ToFp16 convert workloads now use arm_compute::NECast in CpuAcc backend, when available. This should in general be faster.
  • Added Opaque TfLite Delegate build option to the build-tool.

Known Issues

  • Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
  • There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 23.05 that users should be aware of before upgrading. Note: No ABI breaking change occured in ArmNN Core (libarmnn.so) and so the Major version has not changed, only a bump in minor version (32.0.0 → 32.1.0).

Feature SHA Gerrit Review Resultant ABI/API changes
Implement Pimpl Idiom for Delegate Options 1bae865fecf99f25cd2d58390e0cf08467a22b4f https://review.mlplatform.org/c/ml/armnn/+/9358 Size of class DelegateOptions has been changed from 488 bytes to 8 bytes. Layout of parameter's stack of several functions has been changed and therefore parameters at higher positions in the stack may be incorrectly initialized by applications. Size of class Delegate has been changed from 552 bytes to 72 bytes. Size of field m_Options has been changed from 488 bytes to 8 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Implement Pimpl Idiom for OptimizerOptions c5ee0d7460f1e0ec7e2b0639e3e8962934c4df09 https://review.mlplatform.org/c/ml/armnn/+/9369 The following functions have been changed to now accept OptimizerOptionsOpaque argument instead of the unstable OptimizerOptions. Note: OptimizerOptionsOpaque will accept an OptimizerOptions to it's constructor so this is not an API break, only an ABI break.DelegateOptions::DelegateOptions [C1] ( enum armnn::Compute computeDevice, struct armnn::OptimizerOptions const& optimizerOptions,armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int,armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C2] ( enum armnn::Compute computeDevice, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C1] ( std::vectorarmnn::BackendIdconst& backends, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C2] ( std::vectorarmnn::BackendIdconst& backends, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::SetOptimizerOptions ( struct armnn::OptimizerOptions const& optimizerOptions)Replacement functions:DelegateOptions::DelegateOptions [C1] ( enum armnn::Compute computeDevice, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel,armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C2] ( enum armnn::Compute computeDevice, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C1] ( std::vectorarmnn::BackendIdconst& backends, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::DelegateOptions [C2] ( std::vectorarmnn::BackendIdconst& backends, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func)DelegateOptions::SetOptimizerOptions ( armnn::OptimizerOptionsOpaque const& optimizerOptions )
Add constant version of IConnectableLayer::GetConstantTensorsByRef aeec3ce5c8f936fb1220a9de8c84cceef88d4080 https://review.mlplatform.org/c/ml/armnn/+/9196 For class IConnectableLayer a pure virtual method GetConstantTensorsByRef ( ) const has been added. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.Note: The pure virtual function was added at the end so will not result in an ABI break. However you should usually not add new virtual functions for any reason, even to leaf classes, if the class is intended to remain binary compatible on Windows. Doing so may reorder existing virtual functions and break binary compatibility. As we do not target support for Windows we are not going to consider this an ABI break for ArmNN Core.

The following back-end API changes have occurred during the implementation of 23.05 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Remove GetGraph and include of Graph.hpp header from public header c1c5f2a519458f498934fa3f2074acc86f9f2f42 https://review.mlplatform.org/c/ml/armnn/+/9351 Size of class OptimizationViews has been changed from 360 bytes to 88 bytes. Field m_Graph has been removed from this type. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.

TfLite Delegate

  • The existing TfLite Delegate has now been renamed to "Classic" TfLite Delegate to accommodate the new Opaque TfLite Delegate. There has been a file restructure because of this.
  • The Opaque TfLite Delegate provides the same operator coverage as the existing/classic TfLite Delegate. A list of these supported operators can be found in the TfLite Delegate section of the documentation

New features

  • Added support for CEIL, EXPAND_DIMS and SQUEEZE operators.

Bug Fixes

  • Fixed layer support for Comparison, ElementWiseBinary and LogicalBinary operators, by expanding the TensorShape before verifying support, if required.
  • Fixed handling of a negative size value in the Slice operator.
  • Calculate explicit padding for Transpose Convolution using output shape, if specified.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.12.0 (SHA 6f692f73cb2043b4a0b0446539cd8c15b3dd9220)
Onnx 1.6.0
Flatbuffer 2.0.6
Protobuf 3.12.0
Android NDK r25
mapbox/variant 1.2.0
cxxopts 3.1.1 (SHA eb787304d67ec22f7c3a184ee8b4c481d04357fd)
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
mapbox/variant 1.2.0
stb 2.16
xxd 1.10
armnn - Release 23.02

Published by nikraj01 over 1 year ago

New Features

  • Arm NN TOSA Backend
    • Added Concatenation support to TOSA Reference Backend.
    • Added Constant layer support to TOSA Reference Backend.
    • Added Convolution 2D support to TOSA Reference Backend.
    • Added Pooling2d support to TOSA Reference Backend.
    • Added Reshape support to TOSA Reference Backend.
    • Added RSqrt support to TOSA Reference Backend.
    • Added Slice support to TOSA Reference Backend.
    • Added Transpose Convolution 2D support to TOSA Reference Backend.
    • Added Subtraction and Multiplication support to TOSA Reference Backend.
  • Added support for GpuAcc BatchMatMul with FP32.
  • Extend BatchMatMul support for 4D tensors in GpuAcc.

ONNX Parser

  • Provide a CreateNetworkFromBinary method for the ONNX parser.

TfLite Parser:

  • Fixed issue in ParseReshape where the targetShape wasn't always calculated correctly.
  • Fixed issue in ParseFullyConnected where the wrong name was used for the ReshapeLayer.
  • Added an ExpandDims to the FullyConnected to ensure that we reshape the output correctly.

Bug Fixes

  • Bug fixed on ExecuteNetwork when input names where not given, input files were not used.
  • Bug Fixed on delegate Profiling in ExecuteNetwork with multiple iterations.
  • Bug Fixed for CpuAcc and GpuAcc. BuildArmComputePermutationVector() function needed to be rewritten to account for all possible permutation vectors.
  • Fixed an ExecuteNetwork unhandled exception when using option --import-inputs-if-aligned.
  • Fixed Arm NNAPI Support Library to fail gracefully if device is unavailable.
  • Fixed edge cases where some permute vectors for Arm Compute were not converted correctly.
  • Fixed bug where GPU backend options were not being correctly passed by our delegate.
  • Fixed bug when converting Constants with Per-Axis Quantization.
  • Fixed bug where call on SubstituteSubgraph on working copy of subgraph in Optimize fails.
  • Fixed segfault in ExecuteNetwork when no operator is supported by Arm NN.
  • Fixed bug for slot replacement during UpdateSubgraphViewSlotPointers.
  • Fixed bug for ExecuteNetwork using delegate when output is boolean from comparison layer.

Other Changes

  • Disabled BF16-Turbo-Mode and remove conversion layers.
  • Added Arm NN include directory into build-tool output.
  • Code improvement through removal of unused includes.
  • Optimization of IsLayerSupported to reduce calls to it.
  • Removed deprecated code due to be removed in 23.02.
  • Changed Arm NN Support LIbrary to use static libraries instead of object libraries.
  • Added option of static build of Execute Network.
  • Improved error handling when ExecuteNetwork creates a directory when -F option used.
  • Changed ArmNNExecutors to now share a single IRuntime, which allows ExecuteNetwork to create and run multiple Executors instead of one.
  • Added documentation relating to multithreading.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 23.02 that users should be aware of before upgrading.
.

Feature SHA Gerrit Review Resultant ABI/API changes
Optimize the calling of IsLayerSupported(). 5383767a7a759c867235ab66bd71f88281e3bd06 https://review.mlplatform.org/c/ml/armnn/+/8742 In class IConnectableLayer: Pure virtual method SetBackendId (BackendId const&) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
When creating multiple Executors only the last one works fine 5446a4d6d02002515fc58fafe33d74ae6dca5787 https://review.mlplatform.org/c/ml/armnn/+/8997 In class Delegate: Size of this type has been changed from 688 bytes to 680 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. Type of field m_Runtime has been changed from armnn::IRuntimePtr (16 bytes) to armnn::IRuntime* (8 bytes). Size of the inclusive type has been changed
Fix incorrect last layer in Types.hpp 6701daf754efbadcf95c969eee1ba57320763d84 https://review.mlplatform.org/c/ml/armnn/+/8944 In enum LayerType: Value of member LastLayer has been changed from 66 to 71. Applications may execute a wrong branch of code in the library and therefore change the behavior.
Change to MemorySource to keep it usable as a bit mask 1cebf4978bf7723aaf0501de5fb80a6ef77066bf https://review.mlplatform.org/c/ml/armnn/+/9053 In enum MemorySource: Value of member Gralloc has been changed from 5 to 8. Applications may execute a wrong branch of code in the library and therefore change the behavior.

The following back-end API changes have occurred during the implementation of 23.02 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Remove deprecated code due to be removed in 23.02 ec67a0f08e0f96a5aebf3cac65331c67f6649f5e https://review.mlplatform.org/c/ml/armnn/+/8319 In struct Convolution2dQueueDescriptor, DepthwiseConvolution2dQueueDescriptor and FullyConnectedQueueDescriptor: Field m_Bias has been removed from this type. Field m_Weight has been removed from this type. 1) Applications will access incorrect memory when attempting to access this field. 2) Size of the inclusive type has been changed. 3) The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. In class BaseWorkload , BaseWorkload and BaseWorkload: 1) Size of the inclusive type has been changed. 2) Previous accesses of applications and library functions to this field and fields at higher positions of the structure definition may be broken. 3) The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Return INetwork* not INetworkPtr& from OptimizationViews::GetINetwork() 5b2145c92dabb68a0ec7ff65948f52d3fdcecf4a https://review.mlplatform.org/c/ml/armnn/+/8828 In OptimizationViews::GetNetwork(): Base type of return value has been changed from std::unique_ptr<INetwork, void()(INetwork)> to INetwork. Recompilation of a client program may be broken.
Allow working copy SubgraphView to get Original Slots 01f72693d39ed966ad06adadc8aac141bc395659 https://review.mlplatform.org/c/ml/armnn/+/8918 In class SubgraphView: Base class std::enable_shared_from_this has been added. 1) Size of the class has been changed from 160 bytes to 176 bytes. 2) The memory layout in this class has been shifted by 16 bytes. 3) The class has only inline or auto-generated constructors which will be copied to applications at compile time and will allocate an older memory layout. Call of any exported method of this class may access a memory outside the allocated objects or inside the older memory structure and result in crash or incorrect behavior of applications. 4) The memory layout and size of subclasses will be changed.

TfLite Delegate

New features

  • Added support for Slice operator.
  • Made change to allow constant tensors as inputs for input data in the delegate.

Bug Fixes

  • Fixed delegate fallback during VisitNode so that an ArmNN exception is now caught and the process is handed over to TFLite.
  • Added an ExpandDims to the FullyConnected to ensure that we reshape the output correctly.
  • Fixed delegate fallback when fused activation is unsupported.
  • Fixed uncaught warnings treated as errors in delegate release build.

PyArm NN

  • Add installation instructions for prebuilt binaries.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.10.0
Onnx 1.6.0
Flatbuffer 2.0.6
Protobuf 3.12.0
Android NDK r25
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16
xxd 1.10
armnn - Release 22.11.01

Published by nikraj01 over 1 year ago

Summary

This is a patch release to fix an issue in the Arm Support Library encountered on Android phones where the OpenCL libraries could not be detected.

In this case the 22.11 release was detecting the issue and throwing an exception but the Tensorflow Lite runtime was expecting an error code so fallback to the runtime was failing.

In this release an error code is being returned when a misconfigured/missing OpenCL installation is encountered and the Tensorflow Lite runtime is taking over execution of the graph as expected.

This 22.11.01 release contains all the features of Arm NN 22.11 release. Please find release note for 22.11 here https://github.com/ARM-software/armnn/releases/tag/v22.11.

armnn - Release 22.11

Published by nikraj01 almost 2 years ago

Summary

New Features

  • ArmNN to TOSA backend:
    • Added TOSA Mappings backbone structure with support for Addition operator (Float32).
    • Implemented simple TOSA Reference Backend skeleton.
    • Implemented TosaRefBackend::OptimizeSubgraphView.
    • Integrated TOSA Serialization Library into Arm NN.
    • Integrated TOSA Reference Model into Armn NN.
  • BATCH_MATMUL:
    • Added adjoint and transpose parameters to BATCH_MATMUL layer and CpuRef workload.
    • Added support for BATCH_MATMUL to Arm NN Support Library.
    • Added support for BATCH_MATMUL FP32 to CpuAcc.
    • Added BATCH_MATMUL end to end tests.
  • Updated to Android NDK r25.
  • Updated to TensorFlow 2.10 and Flatbuffers 2.0.6.

TfLite Parser

  • Added BATCH_MATMUL to TFLite Parser.
  • Fixed bug in TFLite Parser failing to prepare model due to unspecified size buffer data for SLICE operator.
  • In TFLite Parser we observed that in BATCH_MATMUL layer, when adjoint parameter was true, the mathematical calculation was transpose. So we linked adjoint from TFLite to transpose in ArmNN.
  • Added support for RESHAPE when output 'shape_signature' parameter contains a value of -1 in TFLite Parser.

ArmNN Serializer/Deserializer

  • Added support for BATCH_MATMUL to Serializer/Deserializer.

Bug Fixes

  • Fixed bug in SubgraphView::SubstituteSubgraph where IOutputSlots were incorrectly overridden.
  • Fixed bug in ExecuteNetwork when iterations and input files are not matching.
  • Updated SubgraphView Selector to give deterministic results.
  • Fixed bug in ArmNNExecutor where errors from LoadNetwork were being ignored in.
  • Fixed bug with debug mode not working correctly with Constant Tensors as Inputs.
  • Fixed incorrect kernel measurement in profiling output.
  • Fixed ExecuteNetwork for multiple outputs.
  • Make the AllowExpandedDims option work.
  • Fixed output format issue for int8 when using -w in ExecuteNetwork.

Other Changes

  • Added runtime options to Doxygen.
  • Added message deprecating the use of master branch. main branch is now used.
  • Removed deprecated code due to be removed in 22.08 as we cold not do this in 22.08.
  • Removed deprecated code due to be removed in 22.11.
  • Delayed the removal of deprecated weights and bias by one release.
  • Generalized get_compute_library.sh usage.
  • Use ARMNN_VERSION for Support Library version String.
  • Removed aarch32 build from build-tool.
  • Forward declare ILocalPacketHandlerSharedPtr in IRuntime.hpp
  • Use stricter file extension check in CreateParser.

Note: Following the upgrades to Tensorflow 2.10 and Flatbuffers 2.0.6 a compiler that supports C++17 is now required. This will prevent compilation on some older operating systems, e.g. Debian 9.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.11 that users should be aware of before upgrading.
.

Feature SHA Gerrit Review Resultant ABI/API changes
Remove deprecated code 22.08 48f9d5db00a245d08317130b10171337df0c1142 https://review.mlplatform.org/c/ml/armnn/+/8167 Removed Symbols: INetwork::AddConvolution2dLayer ( struct Convolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, Optionalconst& biases, char const* name ). INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, Optionalconst& biases, char const* name )
Implement simple TOSA Reference Backend skeleton ae8a6f528151a9e88236a92877be1e99aea69658 https://review.mlplatform.org/c/ml/armnn/+/8082 In class MockWorkloadFactory the following has changed: The relative position of virtual method CreateInput ( InputQueueDescriptor const&, struct WorkloadInfo const& ) const has been changed from 5 to 8. The relative position of virtual method CreateWorkload ( enum LayerType, struct QueueDescriptor const&, struct WorkloadInfo const& ) const has been changed from 8 to 7. The relative position of virtual method CreateTensorHandle ( TensorInfo const&, enum DataLayout, bool const ) const has been changed from 7 to 6. The relative position of virtual method CreateTensorHandle ( TensorInfo const&, bool const ) const has been changed from 6 to 5. The layout of v-table has been changed. Call of these virtual methods may result in crash or incorrect behavior of applications.
Fix AllowExpandedDims option 16c76d5db629d3ef7e4cb143bfa7e1d717e1d492 https://review.mlplatform.org/c/ml/armnn/+/8419 Added Symbols: INetwork::Create ( NetworkOptions const& networkOptions ) [static] INetwork::CreateRaw ( NetworkOptions const& networkOptions ) [static] Removed Symbols: INetwork::Create ( NetworkOptions networkOptions ) [static] INetwork::CreateRaw ( NetworkOptions networkOptions ) [static]. Effectively the parameters list has been changed for the above functions. The name of the appropriate symbol for these functions on binary level has been changed. This may cause undefined reference linker error in old client applications. struct OptimizerOptions: Field m_AllowExpandedDims has been added to this type. This field will not be initialized by old clients. NOTE: this field should be accessed only from the new library functions, otherwise it may result in crash or incorrect behavior of applications.
Add functionality to print output tensors to file 7bbf56598010041ea46c3fa9d32604db777ee26e https://review.mlplatform.org/c/ml/armnn/+/8421 struct OptimizerOptions: Field m_DebugToFile has been added at the middle position of this structural type. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.

The following back-end API changes have occurred during the implementation of 22.11 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Implement simple TOSA Reference Backend skeleton ae8a6f528151a9e88236a92877be1e99aea69658 https://review.mlplatform.org/c/ml/armnn/+/8082 ILayerSupport.hpp Changed pure virtual function IsChannelShuffleSupported to virtual function.WorkloadFactory.hpp.The relative position of virtual function CreateInput(const InputQueueDescriptor& descriptor, const WorkloadInfo& info) const has been moved.The layout of v-table has been changed. Call of these virtual methods may result in crash or incorrect behavior of applications.
Fix AllowExpandedDims option 16c76d5db629d3ef7e4cb143bfa7e1d717e1d492 https://review.mlplatform.org/c/ml/armnn/+/8419 const has been added for the constructor OptimizationViews(const NetworkOptions& networkOptions = {}) : m_INetwork(INetwork::Create(networkOptions)) As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Remove deprecated code 22.08 d1628bffe27db398ff5c67c2e20f89e729f8bc31 https://review.mlplatform.org/c/ml/armnn/+/8167 Removed in WorkloadData.hpp ResizeBilinearQueueDescriptor has been removed.

TfLite Delegate

New features

  • Added a no fallback mode to the TfLite Delegate. This should only be used for testing purposes.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.10.0
Onnx 1.6.0
Flatbuffer 2.0.6
Protobuf 3.12.0
Android NDK r25
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16
xxd 1.10
armnn - Release 22.08

Published by nikraj01 about 2 years ago

Summary

New Features

  • Add Arm NN Support Library.
    • The Arm NN Support Library for Android NNAPI is a shared library which has all the functionalities of existing HAL drivers for Android NNAPI.
    • It is available from Android S.
    • It focuses on update-ability of ML operators.
    • Guiide on how to build Arm NN Support Library is available armnn/shim/BuildGuideShimSupportLibrary.md.
    • SLTS (Support Library Test Suit) compliance.
  • Support for Batch MatMul in CpuRef.

TfLite Parser

  • Added support for LOG.
  • Added support for SIN.

ExecuteNetwork App Changes:

  • Refactor of ExecuteNetwork. Now input name, input type, output name, output type and model type are read from the model.

Arm NN Build Tool:

  • Introduced Arm NN Build Tool which consists of an official Arm NN Dockerfile for building Arm NN and Arm Compute Library (ACL).
  • This tool replaces the majority of our existing build guides as a user-friendly way to build Arm NN (and its dependencies) from scratch.
  • Tested on x86_64 (Intel) and aarch64 (Arm) build hosts for the Ubuntu platform.
  • Currently supports targeting Linux devices (from Ubuntu 18.04 onwards) on x86_64, aarch32 and aarch64 architectures.

Bug Fixes

  • The models in format .armnn (serialized models) were failing in 22.05, this problem has been solved by adding the constant layers before the operator layers.
  • Neon fold padding into average pool 2D quantization bug fix.
  • Fix segmentation fault when running --bf16-turbo-mode on FPGA.

Other Changes

  • General documentation refactor and updates.
  • Added LICENSE.spdx for Arm NN
  • Delay backend deprecation from 22.11 to 23.08

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

.

Feature SHA Gerrit Review Resultant ABI/API changes
Import inputs but don't export outputs fails 626bd90378670eb5fd76f94526395430b752ad9e https://review.mlplatform.org/c/ml/armnn/+/7661 Field m_ExportEnabled has been added to type OptimizerOptions. This field will not be initialized by old clients that have not been recompiled.
Get non-const IConnectableLayer from I/O slots 09fa24d2f4b0177d55800bd01ec52c337701ef0a https://review.mlplatform.org/c/ml/armnn/+/7835 Pure virtual method GetOwningIConnectableLayer ( ) has been added to classes IOutputSlot and IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Remove deprecated code 22.05 4d2eec0436f75d526c2ec25623ad73c8d1ee9ac3 https://review.mlplatform.org/c/ml/armnn/+/7712 Removed Symbols: IsCapabilitySupported ( BackendId const& backend, enum BackendCapability capability ) FullyConnectedDescriptor::GetNumViews ( ) const INetwork::Accept ( ILayerVisitor& visitor ) const Pure virtual method Accept ( ILayerVisitor& ) const has been removed from class IConnectableLayer. The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Modified SubgraphView returned by GetWorkingCopy() cea3d49619a87ffb81422c7e9383368baa93a408 https://review.mlplatform.org/c/ml/armnn/+/7852 Pure virtual method GetSlotIndex ( ) const has been added to class IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Update the async api to use ExecutionData 21a6a1a5b72907573eade6d232bfaf45a4c14c52 https://review.mlplatform.org/c/ml/armnn/+/7878 experimental::IWorkingMemHandle Pure virtual method GetExecutionDataAt ( unsigned int ) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. Pure virtual method GetWorkingMemDescriptor ( LayerGuid ) has been removed from this class. The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

The following back-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Update the async api to use ExecutionData 21a6a1a5b72907573eade6d232bfaf45a4c14c52 https://review.mlplatform.org/c/ml/armnn/+/8051/2 The following virtual functions have been added to class IBackendInternal: virtual ExecutionData CreateExecutionData(WorkingMemDescriptor&) const virtual void UpdateExecutionData(ExecutionData&, WorkingMemDescriptor&) const The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. The signature of IWorkload::ExecuteAsync() has changed, it now accepts ExecutionData& instead of WorkingMemDescriptor&.
Add GetMemoryRequirements to IWorkload 5e09080c3848fce5c39424dfac735d3281300aa4 https://review.mlplatform.org/c/ml/armnn/+/7886 The following virtual function has been added to class IWorkload: virtual armnn::Optionalarmnn::MemoryRequirements GetMemoryRequirements() The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Modified SubgraphView returned by GetWorkingCopy() cea3d49619a87ffb81422c7e9383368baa93a408 https://review.mlplatform.org/c/ml/armnn/+/7852 The signature of SubgraphView::GetWorkingCopy() has changed, it has now been marked as const to reflect the fact that the graph represented by the working copy does not get altered.

TfLite Delegate

New features

  • Added support for LOG
  • Added support for SIN
  • Add JNI interface

Bug Fixes

  • Fix running MobileBERT on CpuRef
  • Only use the macro ARMNN_TFLITE_DELEGATE
  • DelegateQuickStartGuide.md errors fix

PyArmNN

  • Documentation update running PyArm NN with ONNX parser.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.19.0
Tensorflow 2.5.0
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16
armnn - Release 22.05.01

Published by nikraj01 over 2 years ago

Summary

New Features

This is a patch release of 22.05 where we have implemented Pooling3d custom operator for ArmNN TfLite Delegate. This feature is available in the 22.05 release branch itself (branches/armnn_22_05) and in the tag created for patch release v22.05.01.

armnn - Release 22.05

Published by nikraj01 over 2 years ago

Summary

New Features

  • ArmnnTestUtils is now versioned and under ABI compliance checker
  • Added support for Int32 CONCATENATION layer for CpuRef
  • Added support for Float32 Unidirectional Sequence LSTM layer for CpuAcc and GpuAcc
  • Added support for GatherNd for CpuRef, CpuAcc and GpuAcc
  • Added support for SQRT for CpuAcc and GpuAcc
  • Added support for Depthwise Convolution2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
  • Added support for Conv2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
  • Added support for Fully Connected ConstTensorsAsInput for CpuAcc and GpuAcc
  • Added support for MaxPool3D and AveragePool3D for CpuAcc and GpuAcc
  • Added support for L2Pooling3D for GpuAcc
  • Added support for UnidirectionalLSTM for CpuAcc
  • ConstTensorsAsInput: Optimizer Fix - FuseBatchNorm
  • ConstTensorsAsInput: Optimizer Fix - FoldPadIntoConvolution2d
  • ConstTensorsAsInput: Optimizer Fix - Fp32ToBf16 optimization

TfLite Parser

  • Added support for GatherNd
  • Added support for FloorDiv
  • Added support for UnidirectionalLSTM
  • Do not create Floor for FloorDiv layer when the data type is int32

ArmNN Serializer/Deserializer

  • Added support for GatherNd

ExecuteNetwork App Changes:

  • Added Reuse IO Buffers mode
  • Profiling details weights and bias JSON keys deprecated. Will be removed for 22.08

Bug Fixes

  • Fixed crashing in profiling
  • Fixed the issue with running SimpleSample app in Raspi
  • Removed MockBackend.hpp from armnn/src/backends/backendsCommon/test/ to solve problems when using Visual Studio in Windows
  • Fixed segfault in RefDepthwiseConvolution2d workload

Other Changes

  • ArmNN Baremetal
    • Change the namespace from armnn::profiling to arm::pipe

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.05 that users should be aware of before upgrading.

.

Feature SHA Gerrit Review Resultant ABI/API changes
Change the namespace from armnn::profiling to arm::pipe 5aa9fd7ac6bf8dad576fa4a0a32aa3dae98d11ab https://review.mlplatform.org/c/ml/armnn/+/7222 Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. The following functions has had a change in signature meaning it will not be recognized by old applications: BackendRegistry::SetProfilingService IRuntime::RegisterDebugCallback Type of field m_LocalPacketHandlers has been changed from std::vector<std::shared_ptrprofiling::ILocalPacketHandler > to std::vector<std::shared_ptrarm::pipe::ILocalPacketHandler > in Runtime::CreateOptions::ExternalProfilingOptions Type of return value has been changed from profiling::ProfilingGuid to arm::pipe::ProfilingGuid in OptimizedNetwork::GetGuid
Replace ProfilingService includes with IProfilingService. af947729dc2aa7cdb6d4a716e2edf307710a8155 https://review.mlplatform.org/c/ml/armnn/+/7240 The following function has had a change in signature meaning it will not be recognized by old applications.BackendRegistry::SetProfilingService
Remove dependency on armnn::Exception classes from the Profiling code f9db3efe5ce2b989b59c47056e1b84b32d2f1100 https://review.mlplatform.org/c/ml/armnn/+/7280 Class armnn::BackendProfilingException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
Replace armnn:Optional with arm::pipe::Optional in profiling code decd08b89565b18067d229c8c25b6f3a3333c653 https://review.mlplatform.org/c/ml/armnn/+/7295 Class armnn::TimeoutException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
Add Unidirectional Sequence Lstm support to TFLite 5880b911bf4b7fd8308c93e299d77ac78f282c19 https://review.mlplatform.org/c/ml/armnn/+/7023 Following fields have been added to struct LstmDescriptor: m_CellIntermediateScalem_ForgetIntermediateScalem_HiddenStateScalem_HiddenStateZeroPointm_InputIntermediateScalem_OutputIntermediateScaleAs a result of this size of the struct has been changed
ConstTensorsAsInput: DepthwiseConvolution2d 0690265d83e5aa79bd174544a7b35330781619dd https://review.mlplatform.org/c/ml/armnn/+/7417 Pure virtual method VisitDepthwiseConvolution2dLayer ( IConnectableLayer const*, struct DepthwiseConvolution2dDescriptor const&, char const* ) has been added to this class.. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
ConstTensorsAsInput: Conv2d - FrontEnd b4dd5cc86d4eb841de670f0f102ede599e0d9c40 https://review.mlplatform.org/c/ml/armnn/+/7382 Pure virtual method VisitConvolution2dLayer ( IConnectableLayer const*, struct Convolution2dDescriptor const&, char const* ) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Move headers to profiling/client/include 277618302d0f131eac0b6ac2015dd3eb09aa6ff9 https://review.mlplatform.org/c/ml/armnn/+/7327 Headers have been moved to profiling/client/include.
Change the namespace from armnn::profiling to arm::pipe 5aa9fd7ac6bf8dad576fa4a0a32aa3dae98d11ab https://review.mlplatform.org/c/ml/armnn/+/7222 Namespace changed from armnn: profiling to armnn: pipe:: profiling

TfLite Delegate

New features

  • Added support for GatherNd

Bug Fixes

Note: Arm NN is aware of an issue where converting a model to .armnn will yield unpredictable results when reading back in through the deserializer. This is due to the serializer being dependent on graph topology and the graph being out of order. The graph becomes out of order because of the additional constant layers as inputs that are created through the parsers

PyArmNN

  • Added support for GatherNd
  • Added Pooling3D

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.5.0
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16

Android 12 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-12.0.0_r1 SP1A.210812.015 r36p0_01eac0-rc0 12_r2 (7987736) 12_r2 (7973604)

Android 11 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r6 RPM1.210413.002 r33p0_01eac0 11_r5 (7640833) 11_r5 (7599184)

Android 10 Compatibility Testing was performed using the following:

Androidtag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0
armnn - Release 22.02

Published by nikraj01 over 2 years ago

Summary

New Features

  • Add mirror padding support on Pad layer for CpuAcc and GpuAcc.
  • Add support for Pool3d FrontEnd, Reference implementation.

TfLite Parser

  • Added missing support for reshape operator when the target shape is dynamic and batch size is unknown.
  • Added PadV2 support.
  • Changed asserts to CHECK in ParserFlatbuffersFixture.hpp.

ArmNN Serializer/Deserializer

  • Add support for Pool3d.

Bug Fixes

  • Added bounds checking when indexing PermutationVector elements and its correspondent unit tests.
  • Fixed output bindings in ExecuteNetwork when using delegate with models with multiple outputs.
  • Fixed build issues in x86 Dockerfile.
  • Fixed ExNet prints inference time twice.
  • Fixed thread safety issues in TimelineDecoder and associated unit tests.
  • Fixed some Thread Sanitizer warnings.
  • Added check for existing event to fix issue on OpenCL Timer.
  • Fixed logging bug where blank messages were being sent.
  • Fixed issues on Logging API.
  • Fixed async execute test on 32bit Raspberry Pi

Other Changes

  • Removed references to blacklist from Model Accuracy tool.
  • Removed deprecated code.
  • Added ModelOptions and addition timing to ARMNN_LOG.
  • Added get_tensorflow.sh script.
  • Updated build guides.
  • Updated error messages from the flatbuffers parser.
  • Added the C++ KWS example.
  • Handled optional biases better in Neon/Cl FullyConnected workloads.
  • Stabilise the Backend API:
    • Backend developers should now be able to limit includes to headers in include/armnn/backends/
    • Moved CompatibleTypes.hpp to the armnnUtils library.
    • Added forwarding header for src/armnn/CompatibleTypes.hpp.
    • Moved the ArmNN Test Utils code to a physically separate directory.
    • Added new method AddPrecompiledLayer() to INetwork.
    • Promoted backend headers in backendCommon to armnn/backends.
    • Used INetwork rather than Graph for holding layers for OptimizationViews.
    • Used IConnectableLayer in SubgraphView rather than Layer in its m_Layers.
    • Stabilised the IWorkloadFactory interface with unified strategy.
    • Stabilised the ILayerSupport interface with unified strategy.
    • Moved SubgraphView to backends include folder.
    • Added GetParameters to IConnectableLayer.
    • Exposed a new MockWorkloadFactory and MockMemManager.
    • Accessing ConstTensors from IConnectableLayer
    • Added method of returning a GetSubgraphWorkingCopy (SubgraphView).
    • Moved MemCopyTestImpl from acl to armnnTestUtils.
  • Support Import of Aligned Host Memory in NNAPI:
    • Added CanBeImported to ITensorHandle.
    • Implemented CanBeImported function in RefTensorHandle.
    • Implemented CanBeImported function in NeonTensorHandle.
    • Implemented CanBeImported function in ClTensorHandle.
    • Added functionality for CopyAndImportFactoryPair to TensorHandleFactoryRegistry.
    • Register CopyAndImportFactoryPairs to RefBackend and unit tests.
    • Register CopyAndImportFactoryPairs to NeonBackend and unit tests.
    • Register CopyAndImportFactoryPairs to ClBackend and unit tests.
    • Added ReplaceTensorHandle functions to IWorkload and BaseWorkload.
    • Added ClBaseWorkload and NeonBaseWorkload.
    • Modified workloads to extend Neon/Cl BaseWorkload.
    • Added ReplaceTensorHandle functions to Neon/CL BaseWorkloads.
    • Implemented ICLTensorProxy.
    • Added input and output workload slot pairs to LoadedNetwork.
    • Added support of aligned host memory.
    • Added Forced Import EndToEnd tests to Ref, Neon, and CL.
    • Call Cl sync after EnqueueWorkload
    • Added EndToEnd tests on reference backend to ensure allocated data can be reused.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

.

Feature SHA Gerrit Review Resultant ABI/API changes
SubgraphView uses IConnectableLayer rather than Layer in its m_Layers 56ccf68c7858560f2ba00f19076b3cb112970881 https://review.mlplatform.org/c/ml/armnn/+/6807 Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot.: Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Stabilize the ILayerSupport interface with unified strategy. 34b429c2215bab7fd12b761dd5c200414c1b4a5b https://review.mlplatform.org/c/ml/armnn/+/6903 Virtual descriptor added to the struct BaseDescriptor, as a result the size of all desciptors has been changed.The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
SubgraphView: Add method of returning a GetSubgraphWorkingCopy. 9d74ba6e85a043e9603445e062315f5c4965fbd6 https://review.mlplatform.org/c/ml/armnn/+/6995 Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Add support of aligned host memory e2af6f4322a1e2b8b3c391fb721a6a80c281477f https://review.mlplatform.org/c/ml/armnn/+/7025 The following functions have had a change in signature meaning they will not be recognized by old applications: IRuntime::EnqueueWorkload() accepts two new parameters preImportedInputIds and preImportedOutputIds. IRuntime::ImportInputs() accepts a new parameter forceImportMemorySource. IRuntime::ImportOutputs() accepts a new parameter forceImportMemorySource.
Add GetParameters to IConnectableLayer e46659669b753411421a6a552b32b9f1d27b8b2e https://review.mlplatform.org/c/ml/armnn/+/7031 Pure virtual method GetParameters ( ) const has been added to class IConnectableLayer. Virtual method IsNull ( ) const has been added to class BaseDescriptor. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Accessing ConstTensors from IConnectableLayer 2e24175c683bca42496104591d6b702dad360b8e https://review.mlplatform.org/c/ml/armnn/+/7040 Pure virtual method GetConstantTensorsByRef ( ) has been added to class IConnectableLayer. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Remove deprecated code 22.02 b28e525233d43b2aaea4da56acdbe9914cb41b5b https://review.mlplatform.org/c/ml/armnn/+/7104 Deprecated LayerSupport.hpp and included IsXXXLayerSupported() functions have been removed as they have been replaced with ABI Stable ILayerSupport interface and the BackendHelper.hpp GetILayerSupportByBackendId() function.

The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Add a Pooling3d FrontEnd and Ref Implementation 7b885b3cce70154596b1994b013ea91527117c26 https://review.mlplatform.org/c/ml/armnn/+/6511 ILayerSupport.hpp Pure virtual function IsPooling3dSupported added requiring implementation by backend developers.
Stabilize the ILayerSupport interface with unified strategy. 34b429c2215bab7fd12b761dd5c200414c1b4a5b https://review.mlplatform.org/c/ml/armnn/+/6903 ABI stable virtual function IsLayerSupported(const LayerType& type, ...) has been added to ILayerSupport.hpp.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Stabilize the IWorkloadFactory interface with unified strategy 611c7fb97412230d5cefee047081455fb60db06c https://review.mlplatform.org/c/ml/armnn/+/6906 ABI stable virtual function CreateWorkload(const LayerType& type, ...) has been added to class IWorkloadFactory.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

TfLite Delegate

New features

  • Added Delegate cross compile to x86 Dockerfile
  • Added constant input supports for Pack/Stack, Concatenation operators
  • Added Int32 support to Pack/Stack operator on CpuRef
  • Removed unsupported operator: Gather
  • Added missing qasymms8 output type

Bug Fixes

  • Fixed reshape.

Py Arm NN

  • Added support for the following operations ChannelShuffle, Cast, Convolution3D, LogicalBinary, MirrorPad, Reduce, Shape, and Transpose (Permute).
  • Removed references to blacklist and whitelist.
  • Adding the python KWS example.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.5.0
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16

Android 12 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-12.0.0_r1 SP1A.210812.015 r35p0_01eac0 12_r1_arm64 (7698606) 12_r1_arm64 (7698606)

Android 11 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r6 RPM1.210413.002 r32p0_01eac0 11_r5 (7640833) 11_r5 (7599184)

Android 10 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0
armnn - Release 21.11

Published by nikraj01 almost 3 years ago

Arm NN 21.11 was focused on providing new capabilities and improve performance:

New Features

  • Added support for Reduce Prod.
  • Added support for Channel Shuffle.
  • Added support for Conv3d.
  • Added support for Symmetric and Reflect Padding on CpuRef backend.
  • Added support for statically linking ArmNN TfLite Delegate against Tensorflow Lite.
  • Added Import Input/Output functions to async API, allowing for imported I/O buffers to be used by multiple network executions.
  • Added external memory manager that allows for customization of network memory management ( Note: currently only fully supported on the CpuRef Backend ).

TfLite Parser

  • Added support for Reduce Prod.
  • Added support for Conv3d.
  • Added support for MirrorPad.
  • Added support for size of -1 for Slice.

ONNX Parser

  • Add support for Concat
  • Add support for Gather
  • Add support for Gemm
    • The parser supports constant bias or non-constant bias where bias dimension = 1.
  • Add support for Shape
  • Add support for Unsqueeze
  • Add support of min/max as attribute for Clip

ArmNN Serializer/Deserializer

  • Add support for Reduce Prod.
  • Add support for Channel Shuffle.
  • Add support for Conv3d.
  • Add support for Symmetric and Reflect Padding.

ExecuteNetwork App Changes

  • Added 'do-not-print-output' option to ExecuteNetwork.

Bug Fixes

  • Using output-network-details or output-network-details-only during ExecuteNetwork profiling created an invalid JSON format. This has since been fixed.
  • Fixed undefined reinterpret_cast in BFloat16.hpp. It fixes gcc builds with version 8 or above.
  • Fixed format of the delegate JSON output.
  • Fixed bug related with constant tensor flag.
  • Fixed pyarmnn py35 unit tests.

Other Changes

  • Added sample app for asynchronous execution.
  • Printed new Optimize and LoadedNetwork profiling points.
  • Added new serialized model supported on Netron.
  • Made it possible for backends to add include paths in Android.
  • Changed order of the Doxygen tree.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 27.0.0, the Delegate to 25.0.0 and also bumping our Parsers to 24.3.0 following Semantic Versioning guidelines.

.

Feature SHA Gerrit Review Resultant ABI/API changes
Remove deprecated code 1b2654fb799c3d25ffcef4d31b5d026d359e2f8f https://review.mlplatform.org/c/ml/armnn/+/6254 Removed Symbols: INetwork::AddAbsLayer ( char const* name ) INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name ) INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, char const* name ) INetwork::AddEqualLayer ( char const* name ) INetwork::AddGatherLayer ( char const* name )INetwork::AddGreaterLayer ( char const* name ) INetwork::AddMergerLayer ( MergerDescriptor const& mergerDescriptor, char const* name ) INetwork::AddResizeBilinearLayer ( struct ResizeBilinearDescriptor const& descriptor, char const* name ) INetwork::AddRsqrtLayer ( char const* name ) LayerSupport::IsMergerSupported ( BackendId const& backend, std::vector<TensorInfo const*> inputs, TensorInfo const& output, struct OriginsDescriptor const& descriptor, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsResizeBilinearSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsRsqrtSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsSplitterSupported ( BackendId const& backend, TensorInfo const& input, struct ViewsDescriptor const& descriptor, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) Removed pure virtual methods, resulting in change to v-table layout: ILayerVisitor::VisitAbsLayer ILayerVisitor::VisitEqualLayer ILayerVisitor::VisitGatherLayer ILayerVisitor::VisitGreaterLayer ILayerVisitor::VisitMergerLayer ILayerVisitor::VisitResizeBilinearLayer ILayerVisitor::VisitRsqrtLayer Removed DataTypes: DataType::QuantisedAsymm8 DateType::QuantisedSymm16 DataType::QuantizedSymm8PerAxis
'IMemoryOptimizerStrategy Add strategy library and add support in BackendRegistry' b8a26d8f497f92643288a4c519af4d230ede1d7e https://review.mlplatform.org/c/ml/armnn/+/6297 struct IRuntime::CreationOptions: Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type. class BackendRegistry: Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type.
Add missing runtime parameters to TfLite delegate. 3e32a8700bf12d3b70d2824c12cdae907bde9360 https://review.mlplatform.org/c/ml/armnn/+/6388 class Delegate: Size of field m_Options has been changed from 136 bytes to 352 bytes. class DelegateOptions had the following fields added and so the size of the inclusive type has been changed. Field m_DynamicBackendsPath has been added to this type. Field m_EnableGpuProfiling has been added to this type. Field m_InternalProfilingDetail has been added to this type. Field m_InternalProfilingEnabled has been added to this type. Field m_ProfilingOptions has been added to this type. Field m_SerializeToDot has been added to this type.
Profiling instrumentation throughout the Optimizer f1e0ad38f1bc064e78e795f57db23901cf13f4ce https://review.mlplatform.org/c/ml/armnn/+/6432 struct OptimizerOptions: Field m_ProfilingEnabled has been added to this type. class Delegate: Size of this class has been increased from 416 bytes to 424 bytes. class DelegateOptions: Size of this class has been increased from 352 bytes to 360 bytes. Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. This is due to addition of m_ProfilingEnabled to the OptimizerOptions used in constructors of both Delegate classes.
Fix armnn_external_delegate option parsing b1c62f11881e0d528bea5b3664a8f36e4c03b508 https://review.mlplatform.org/c/ml/armnn/+/6519 class Delegate: Size of field m_Options has been changed from 360 bytes to 616 bytes. class DelegateOptions: Field m_RuntimeOptions has been added to this type. Field m_BackendOptions has been removed from this type. Field m_DynamicBackendsPath has been removed from this type. Field m_EnableGpuProfiling has been removed from this type. Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
Support the new memory API in loaded network b1aad4270fa8ad5c4aa62e27d564baf723b2cee5 https://review.mlplatform.org/c/ml/armnn/+/6552 class INetworkProperties: Field m_ExternalMemoryManagementEnabled has been added to this type. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.

The following back-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Remove deprecated code 1b2654fb799c3d25ffcef4d31b5d026d359e2f8f https://review.mlplatform.org/c/ml/armnn/+/6254 IBackendInternal.hpp Removed Symbols: virtual ISubGraphConverterPtr CreateSubGraphConverter(const std::shared_ptr& subGraph) const; virtual Optimizations GetOptimizations() const; virtual SubGraphUniquePtr OptimizeSubGraph(const SubGraph& subGraph, bool& optimizationAttempted) const; Removed Aliases: GraphUniquePtr, SubgraphViewUniquePtr, ISubGraphConverterPtr, SubGraphUniquePtr ILayerSupport.hpp Removed Symbols: IsEqualSupported IsGatherSupported IsGreaterSupported IsMergerSupported IsResizeBilinearSupported IsRsqrtSupported IsSplitterSupported
Add Channel Shuffle Front end and Ref Implementation 51f67776a695c217a32596af806afeeb080f5528 https://review.mlplatform.org/c/ml/armnn/+/6211 ILayerSupport.hpp Pure virtual function IsChannelShuffleSupported added requiring implementation by backend developers.
Add Conv3d FrontEnd and Ref Implementation b63a31170aee1d28267d83a4bc67b57708fb6b05 https://review.mlplatform.org/c/ml/armnn/+/6338 ILayerSupport.hpp Pure virtual function IsConvolution3dSupported added requiring implementation by backend developers.

TfLite Delegate

New features

  • Added support for Reduce Prod.
  • Added support for Conv3d.
    • Conv3d is only currently supported in the TfLite Delegate when compiling with TensorFlow 2.6 and above.
  • Added support for Floor Div.
  • Added support for MirrorPad.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.5.0
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest 2.4.6
fmt 7.0.1
ghc 1.3.2
half 1.12.0
stb 2.16

Note: We have also added an Arm NN Android Library as a new experimental feature. It allows you to easily integrate Arm NN into an Android app. Please find the .aar file in the Asset section.

Android 12 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-12 SP1A.210812.003 r34p0_01eac0 12_r1 (eng.upr473.20210901.005349) 12_r1 (eng.upr473.20210901.024841)
android-12 SP1A.210812.003 r32p1_01eac0 12_r1 (eng.upr473.20210901.005349)1 12_r1 (eng.upr473.20210901.024841)

1: CtsNNAPITestCases with Mali Driver r32p1_01eac0. The following test is known to be failing: AddTwoWithHardwareBufferInputWithGPUUsage. Investigations indicate this failure is due to Android NN HAL utilizing Gralloc functionality not required by the Gralloc API. This issue has been raised with Google Android team, and is tracked as https://partnerissuetracker.corp.google.com/issues/202025253. Please quote Arm reference MIDCET-3783 when discussing this issue.

Android 11 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r6 RPM1.210413.002 r33p0_01eac0 11_r3 (7127450) 11_r3 (7137996)

Android 10 Compatibility Testing was performed using the following:

Androidtag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0
armnn - Release 21.08

Published by nikraj01 about 3 years ago

Summary

Arm NN 21.08 was focused on providing new capabilities and improve performance::

  • Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
  • Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
  • Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
  • Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
  • More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.

New Features

  • Moved unit tests from BOOST to doctest.
  • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Reduce Operator can now support multiple axes.
  • Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
  • Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
  • Added SHAPE Operator support on CpuRef backend.
  • Moved useful test utilities to new static library (libarmnnTestUtils.a).
  • Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
  • Arm NN TfLite Delegate Image Classification sample application added to samples directory.
  • Added fully comprehensive Arm NN Operator list page to Doxygen.
  • Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
    • Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.

TfLite Parser

  • EXPAND_DIMS Operator support added.
  • PRELU Operator support added.
  • SHAPE Operator support added.
  • Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
    • If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.

ArmNN Serializer/Deserializer

  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Added SIN and LOG support to ElementWiseUnary Operator.
  • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.

ExecuteNetwork App Changes

  • Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
  • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
  • Added option to specify different input data for every iteration of ExecuteNetwork.
  • Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.

NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.

Bug Fixes

  • Removed duplicate check for Dequantize input type when checking if operator is supported.
  • Fixed undefined behaviour in PolymorphicDowncast.
  • Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
  • Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
  • Fixed cl_ext.h include path in CL backend.
  • Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
  • Fixed gcc 9.3.0 compiler warning in TfLiteParser.
  • Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.

Other Changes

  • Print Elementwise and Comparison Operator descriptors in a dot graph.
  • Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
  • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Rework the async threadpool f364d5391b08e9071cd965f5765385ec9156b652 https://review.mlplatform.org/c/ml/armnn/+/5801 Be aware that these classes are in the experimental namespace and should be treated as such. struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes. class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class. class IAsyncExecutionCallback: The following methods have been removed: GetEndTime ( ) const GetStartTime ( ) constWait ( ) const GetStatus ( ) const
Add IsConstant flag to TensorInfo b082ed076b489f17bad3663005801b251d642108 https://review.mlplatform.org/c/ml/armnn/+/5842 class TensorInfo: Size of this class has been increased from 80 bytes to 88 bytes. This is due to the addition of private member bool m_IsConstant. An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Add protected mode to ArmNN CreationOptions 15fcc7ed3163c9d4b1856955271854198c3c2696 https://review.mlplatform.org/c/ml/armnn/+/5963 struct IRuntime::CreationOptions: Field m_ProtectedMode has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add the Custom Memory Allocator interface definition 801e2d55de7a02b98f3d77dc9775b10b2bd9f16b https://review.mlplatform.org/c/ml/armnn/+/5967 struct IRuntime::CreationOptions: Field m_CustomAllocator has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add front end support for UnidirectionalSequenceLstm on ArmNN 8ed39ae450a077c7e4d672b5f05ff1d68ee67aab https://review.mlplatform.org/c/ml/armnn/+/5956 struct LstmDescriptor: Field m_TimeMajor has been added to this type. This field will not be initialized by old clients. Size of the inclusive type has been changed.
JSON profiling output 554fa09a0f3d6c9c572634c9d2de9bfb6c3218b0 https://review.mlplatform.org/c/ml/armnn/+/5968 struct INetworkProperties: Field m_ProfilingEnabled has been added to this type. This field will not be initialized by old clients.
ConstTensorsAsInput: FullyConnected 81beae3a870004795275e9266bc43d845b9f78db https://review.mlplatform.org/c/ml/armnn/+/5942 class ILayerVisitor: Pure virtual method VisitFullyConnectedLayer ( IConnectableLayer const*, struct FullyConnectedDescriptor const&, char const* ) has been added to this class. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.The following previously deprecated functions have been removed:INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name) INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, char const* name)
Adds CustomAllocator interface and Sample App c1c872f12797ef6fe52c4589113e7efc353e56eb https://review.mlplatform.org/c/ml/armnn/+/5987 struct IRuntime::CreationOptions: Field m_CustomAllocatorMap has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.class BackendRegistry: Field m_CustomMemoryAllocatorMap has been added to this type. Size of this type has been changed from 80 bytes to 136 bytes.
Allow profiling details to be switched off during profiling f487486c843a38fced90229923433d09f99fc2e5 https://review.mlplatform.org/c/ml/armnn/+/6069 struct INetworkProperties: Field m_OutputNetworkDetails has been added at the middle position of this structural type. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.

The following back-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Refactor the reporting of capabilities from backends b9af86ea42568ade799ee5529137e4756977b6c6 https://review.mlplatform.org/c/ml/armnn/+/5728 class IBackendInternal: virtual function GetCapabilities() const has been added, replacing the now deprecated HasCapability() function. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Add protected mode to ArmNN CreationOptions 15fcc7ed3163c9d4b1856955271854198c3c2696 https://review.mlplatform.org/c/ml/armnn/+/5963 class IBackendInternal: virtual function UseCustomMemoryAllocator() has been added.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

TfLite Delegate

New features

  • PRELU Operator Support added.
  • SHAPE Operator support added.
  • Added Asynchronous Network Execution.
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.3.1
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0

Android 12 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-12 SP1A.210812.003 r32p1_01eac0 12_r1 (eng.upr473.20210901.005349)1 12_r1 (eng.upr473.20210901.024841)

1: CtsNNAPITestCases with Mali Driver r32p1_01eac0. The following test is known to be failing: AddTwoWithHardwareBufferInputWithGPUUsage. Investigations indicate this failure is due to Android NN HAL utilizing Gralloc functionality not required by the Gralloc API. This issue has been raised with Google Android team, and is tracked as https://partnerissuetracker.corp.google.com/issues/202025253. Please quote Arm reference MIDCET-3783 when discussing this issue.

Android 11 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r1 RP1A.200720.009 r31p0_01eac0 11_r4 (7352019) 11_r4(7337463)
android-11.0.0_r6 RPM1.210413.002 r32p0_01eac0 11_r4 (7352019) 11_r4 (7337463)
android-11.0.0_r6 RPM1.210413.002 r33p0_01eac0 11_r4 (7352019) 11_r4 (7337463)

Android 10 Compatibility Testing was performed using the following:

Androidtag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0
armnn - Release 21.05

Published by nikraj01 over 3 years ago

Summary

The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:

  • Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
  • Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.

In addition to this, support was added to allow the MobileBERT network to be parsed and run.

Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.

New Features

  • CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
  • Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
  • Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
  • Asynchronous Network Execution for CpuRef Backend.
  • Optimisation added to fuse PAD into Pooling2d if possible.
  • ASR sample application added to samples directory.

TfLite Parser

  • ABS Operator Support added.
  • ARG_MIN Operator Support added.
  • CAST Operator Support added.
  • LOGICAL_NOT Operator Support added.
  • RSQRT Operator Support added.
  • Non-const weights support added on FULLY_CONNECTED layer.
  • Turn off Biases when data location is -1 (Added to support MobileBERT).

ArmNN Serializer/Deserializer

  • Added Signed64 support to Serializer and Deserializer.
  • Added QAsymmS8 support to Serializer.
  • Added L2 Pooling algorithm to Deserializer.

ExecuteNetwork App Changes

  • Asynchronous Network Execution support (Currently for CpuRef Backend).
  • Re-enabled GPU profiling in ExecuteNetwork.

Deprecated features

  • Deprecated the Caffe Parser.
  • Deprecated the Tensorflow Parser.
  • Deprecated the Arm NN Quantizer tool.
  • Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.

Bug Fixes

  • Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
  • Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
  • Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
  • Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
  • Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
  • Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
  • Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
  • Fixed x86_64 ArmNN DockerFile.
  • Fixed TuningLevel enumeration values to be consistent.
  • Fixed YoloV3 test application's incorrect use of std::abs.
  • Improved performance on SqueezeNet v1.1.

Other Changes

  • Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
  • Moved doctest third-party library to armnn from delegate.
  • Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
  • Updated Cross Compiling Guide.
  • Improved Graph memory usage.

Known Issues

  • Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
  • There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Add Async Queue to IRuntime e813d67f86df41a238ff79b5c554ef5027f56576 https://review.mlplatform.org/c/ml/armnn/+/5493 For struct INetworkProperties the member variable size_t m_NumThreads has been added resulting in the change of size of the inclusive type.
Add front-end support for CAST + Add TfLiteParser support for CAST b392e9845b7f40ab0c389f29f13f6ec84dd814d1 https://review.mlplatform.org/c/ml/armnn/+/5374 For enum class LayerType a new enum for Cast has been added which changes the class member LastLayer to equate to Cast rather than the previous Unmap. We advise against the usage of armnn::LayerType::LastLayer where stability is required.
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory 73d3e2e1616ba5dcdb0a190afba2463742bd4fcc https://review.mlplatform.org/c/ml/armnn/+/5481 For struct INetworkProperties the member variable MemorySource m_InputSource has been added resulting in the change of size of the inclusive type. For struct INetworkProperties the member variable MemorySource m_OutputSource has been added resulting in the change of size of the inclusive type.
Move ILayerSupport.hpp to backends folder cae45686aeed0761ff2c9115ef0a064278ae75fa https://review.mlplatform.org/c/ml/armnn/+/5500 include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface. Front end users should move to using ABI stable GetILayerSupportByBackendId()
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator f0a6dec75832604d5ab18242dc216852821a8279 https://review.mlplatform.org/c/ml/armnn/+/5180 For class LayerSupportHandle the member variable BackendId m_BackendId has been added resulting in the change of size of the inclusive type. For struct FullyConnectedDescriptor the member variable bool m_ConstantWeights has been added resulting in the change of size of the inclusive type.
Refactor Async Network API 55a8ffda24fff5515803df10fb4863d46a1effdf https://review.mlplatform.org/c/ml/armnn/+/5365 For struct INetworkProperties the member variable bool m_AsyncEnabled has been added resulting in the change of size of the inclusive type.
Remove cross-wiring in depthwise 7612bd6cc385dfbf54f831a6349f3a9363c6d0a2 https://review.mlplatform.org/c/ml/armnn/+/5411 For method armnnUtils::Permuted() the argument bool perChannelPermute which was defaulted to false has been removed.
Remove Quantizer 4a621c43174b6bdd9dc0bff839b245bc2139d6a6 https://review.mlplatform.org/c/ml/armnn/+/5486 The formerly deprecated class INetworkQuantizer has been removed and so any code making use of it must be altered.

The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator 16fb1a2d9c1d3d80c0f0b6ab549919fbabd2a0b9 https://review.mlplatform.org/c/ml/armnn/+/5180 For class IBackendInternal the virtual method HasCapability ( enum BackendCapability ) const has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Move ILayerSupport.hpp to backends folder cae45686aeed0761ff2c9115ef0a064278ae75fa https://review.mlplatform.org/c/ml/armnn/+/5500 include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface.
Generalise ConstCpuTensorHandle 1f58f03d82c482626b1b4673b6c0e25da4338fb5 https://review.mlplatform.org/c/ml/armnn/+/5515 include/armnn/backends/CpuTensorHandleFwd.hpp has been deprecated and replaced with include/armnn/backends/TensorHandleFwd.hpp and the forward declarations it contained have also been renamed to remove "Cpu".
Enable import on GPU e5f0b2409c2e557a5a78e2f4659d203154289b23 https://review.mlplatform.org/c/ml/armnn/+/5605 For class IBackendInternal the virtual method CreateWorkloadFactory with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. For class IBackendInternal the virtual method RegisterTensorHandleFactories with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. For class ITensorHandleFactory the method SupportsMapUnmap() is no longer final.

TfLite Delegate

New features

  • Non-const weights support added on FULLY_CONNECTED layer
  • CAST operator support
  • PACK operator support
  • UNPACK operator support
  • Added program options to armnn_external_delegate.cpp
    • enable-fast-math
    • number-of-threads
    • save-cached-networks
    • cached-network-filepath
  • Signed64 support added

Bug Fixes

  • Fix added to set the correct index for connecting constant layers.
  • Fix added to handle negative axis correctly in SPLIT operator.

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
CMake 3.7.2 or later
boost 1.64
Tensorflow 2.3.1
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0
Android 11 Compatibility Testing was performed using the following
Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r1 RP1A.200720.009 R30P0_01EAC0 11_r3 (7127450) 11_r3 (7137996)
android-11.0.0_r1 RP1A.200720.009 R31P0_01EAC0 11_r3 (7127450) 11_r3 (7137996)
android-11.0.0_r6 RPM1.210413.002 R32P0_01EAC0 11_r4 (7352019) 11_r4 (7337463)
Android 10 Compatibility Testing was performed using the following:
Android Tag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0
armnn - Release 21.02

Published by nikraj01 over 3 years ago

Summary

New Features:

  • Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
    • This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
  • Semantic Versioning for ArmNN APIs.
  • Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
    • Further operator support.
    • Add capability to build on Android.
  • Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2.

TfLite Parser

  • Added support for ELU activation.
  • Support Dilation in Conv2D.

ONNX Parser

  • Support Dilation in Conv2D.

Caffe Parser

  • Added Dilation support.
  • Added argmax deconv support.

ArmNN Serializer

  • Serialise ArmNN Model on android-nn-driver.

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

  • Two optimization parameters were added to enable saving and loading of the ClContext.
    • save-cached-network
    • cached-network-filepath

Other changes:

  • Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction.
  • Added CL/NEON implementation of RANK Workload.
  • Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators.
  • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend.
  • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend.
  • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend.
  • Added more Fused Activation unit tests.
  • Handle Neon optionality on 32 bit linux platforms.
  • Validated MobileNetv2-SSD and MobileNetv3-SSD support.
  • Add CpuAcc specific configuration option numberOfThreads.
  • Add GpuAcc MLGO tuning file configuration argument.

Bug Fixes:

  • Default stride values in depthwise and convolution to 1 instead of 0.
  • Fixed transpose conv InferOutputShape.
  • Fix incorrect padding value for asymmetric quantized type.
  • Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
    • Further fix for macosx where filenames are case insensitive.
  • Unittest failure on mipsel/s390x/ppc64/powerpc.
  • ArmnnQuantizer incorrectly Quantizes all data types.
  • Fixed TFLite parser not parsing TransposeConvolution.
  • Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases.
  • Fix wav2letter not producing correct output for Neon backend.
  • Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser.
  • Fix Reduce workload to allow input tensors of any rank into the validate function.
  • Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#.
  • Add missing serializer support for m_DimensionsSpecificity.
  • Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers.

Known issues:

Deprecation Notification:

The following components have been deprecated and will be removed in the next release (21.05) of Arm NN.

Ubuntu 16.04 LTS is reaching End of Life.

Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.

TfLite Delegate

New Features:
  • Enabled ELU Activation.
  • Enabled HARD_SWISH Activation.
  • Added GATHER operator support.
  • Added Logical AND, NOT and OR operator support.
  • Added PAD operator support.
  • Added PADV2 operator support.
  • Added SPLIT operator support.
  • Added SPLIT_V operator support.
  • Added ARG_MAX operator support.
  • Added ARG_MIN operator support.
  • Added LOCAL_RESPONSE_NORMALIZATION operator support.
  • Added L2_NORMALIZATION operator support.
  • Added BATCH_TO_SPACE_ND operator support.
  • Added SPACE_TO_BATCH_ND operator support.
  • Added DEPTH_TO_SPACE operator support.
  • Added SPACE_TO_DEPTH operator support.
  • Added SUM operator support.
  • Added REDUCE_MAX, REDUCE_MIN operator support.
  • Added FLOOR operator support.
  • Added OptimizerOptions
    • Reduce Float32 to Float16.
    • Reduce Float32 to BFloat16.
    • Enable debug data.
    • Enable memory import.
  • Added STRIDED_SLICE operator support.
  • Added LSTM operator support.
Other Changes:
  • Provided Android build.
  • Removed Tensorflow requirement.
Bug Fixes:
  • Fixed fused activation in Fully Connected layer.
  • Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.
Known Issues:

Note: We have added pre-built binaries (please see the Assets) of 21.02 Arm NN along with this release. Please refer to BuildGuideNative.md guide in the armnn/delegate for more information.

Build dependencies:

Tools Supported Version
Git 2.17.1 or later
Scons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Boost 1.64
Tensorflow 2.3.1
Caffe tag 1.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Eigen3 3.3
Android NDK r20b
mapbox/variant 1.2.0
Android 11 Compatibility Testing was performed using the following:
Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r1 RP1A.200720.009 R26P0_01EAC0, R30P0_01EAC0 11_r2 (6965179) 11_r2 (6961477)
Android 10 Compatibility Testing was performed using the following:
Android Tag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0

Note: Going forward Arm NN will be making document updates to the latest release, if we have missed any, and these will be available in github by selecting the doc tag corresponding to the release. For example, we have tag 21.02.doc1 which basically is the 21.02 release and also includes some of the documents which we updated for the 21.02 Release. There are no changes functionality wise. These document changes are cherry picked to the branches/armnn_21_02.

armnn - Release 20.11

Published by nikraj01 almost 4 years ago

Summary

The 20.11 Release was intended to provide major improvements to usability and performance in addition to delivering some additional functionality.

The usability enhancements were:

  • Added Debian packaging for ArmNN Core, TfLite Parser and PyArmNN to Ubuntu Launchpad. This means users on Linux no longer need to go through a source repository setup and compile in order to start working.
  • Addition of TfLite Delegate as well as 21 of its most valuable operators. Allows a much larger set of models to be executed as operators that are not accelerated in the delegate will execute in the TfLite interpreter.
  • Removal of the boost framework from all ArmNN code bar our unit tests. Simplifies deployment as the dependency on boost no longer exists.
  • Website updates (better layout and more examples).

The performance enhancements were:

  • ArmNN integration of Compute Library Activation and Batch Normalization fusing.
  • ArmNN exposed the Compute Library fastmath option as a parameter that can be set on a per model basis and in some scenarios will result in the selection of a faster convolution algorithm at the cost of some accuracy (winograd).

The additional functionality was:

  • Addition of high priority partner requested Logical AND/OR/NOT operators in NNAPI.
  • Support for Android R, verified against CTS 11_r3 (Build Id: 20201114.173303).
  • Added support for the EfficientNet-Lite Model.

New Features:

  • Added Debian packaging, which allows ArmNN to be installed via our APT repository on Ubuntu's Launchpad.
  • Added ability to turn on the Compute Library fast_math option through ExecuteNetwork and the Android-nn-driver.
    • Using the fast_math flag can lead to performance improvements in fp32 and fp16 layers but at the cost of some accuracy.
    • The fast_math flag will not have any effect on int8 performance.
  • Added support for Logical NOT, AND and OR for CpuRef, CpuAcc and GpuAcc.
  • Added optimization to fuse BatchNorm into Convolution and Depthwise Convolution in fp32 and fp16.
  • Added backend specific optimization to fuse Activations into the previous workload.
    • Currently Activations can be fused with Addition, BatchNorm, Convolution, Depthwise Convolution, Division, Multiplication or Subtraction workloads on both CpuAcc and GpuAcc.
    • Not all workloads can support all Activations.
  • Added AddBroadcastReshapeLayer as optimizer.
  • Added Map layer and Map workload. This layer has 1 input slot and 0 output slots and simply calls ->Map() on the input tensor handle.
  • Added Unmap layer and Unmap workload. This layer has N input slot and 0 output slots and simply calls ->Unmap() on the input0 tensor handle. The remaining inputs are used for determining scheduling dependencies.
  • Added support for TfLite Delegate (More information below in TfLite Delegate section).

TfLite Parser:

  • Remove AddBroadcastReshapeLayer from TfLite Parser and added to optimizations.
  • TfLite version updated to 2.3.1.

Tf Parser:

  • Tensorflow version updated to 2.3.1.
  • Add support for 2nd input to ExpandDims in TfParser.

ArmNN Serializer:

  • Added support for Logical NOT, AND and OR.

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

  • Added ability to enable Compute Library fast_math through ExecuteNetwork.
  • Added ability to execute models using TfLiteDelegate.
  • Refactored ExecuteNetwork to support cxxopts.
  • Allow use of dynamic backendId in execute network.

Other changes:

  • Removed remaining boost from ArmNN runtime code (Boost still resides in Unit Tests).
    • Removed boost::format and swapped to fmt
      • Link fmt statically and change to be header-only interface library
    • Removed boost::tokenizer and boost::escaped_list_separator to avoid use of CsvReader
    • Removed boost::make_iterator_range and boost::to_upper_copy
    • Removed boost::transform_iterator and make_transform_iterator
    • Removed boost::numeric_cast
    • Removed boost::math::fpc uses
    • Removed boost/preprocessor.hpp
    • Removed boost::program_options and swapped to cxxopts
    • Removed boost::variant and swapped to mapbox/variant library
    • Removed Boost from standalone dynamic backend
    • Removed remaining Boost references from test executables
  • Extended dump file with info about fused layers.
  • Added SECURITY.md file that contains the security policy, vulnerability reporting procedure and a PGP key that can be used to create secure vulnerability reports.
  • Graph::Print() now outputs more information such as number of input/output tensors and tensor dimensions.
  • Updated Protobuf to 3.12.0.
  • Load dynamic backends for YoloV3 tests.
  • Included layer GUID in SerializeToDot output.
  • Refactored Optimize(...) function to throw exceptions instead of returning null.
  • Speed up the reference backend.
  • Added int32 and int64 ArgMax op support.
  • Added Quantization operator=() function to Tensor.
  • Introduce ModelOptions to OptimizedNetwork.
    • Added ability to pass ModelOption through Network::LoadNetwork() to Workload factory.
  • Added Load-scope dynamic tensor TfLite tests.

Bug Fixes:

  • Fixed Unittest failure while building using EthosNAcc backend.
  • Fixed crash on model with Fullyconnected Sigmoid Activation by adding supported activations check to Neon FullyConnected validate.
  • Fixed logical VTS skip.
  • Fixed issue where EthosNAcc backend would output all zeros when falling back to CpuRef.
  • Fixed issue causing SSD Mobilenet f16/uint8 to fail on CpuRef via ExecuteNetwork.
  • Fixed issue with signed-int8 quantized model.
  • Fixed error running EfficientNet-Lite on GpuAcc.
  • Fixed validation for per-channel quantization.
  • Fixed segfault between Neon and Cl layers.
  • Fixed NonMaxSuppression.
  • Fixed Yolov3 producing 0s on Neon.
  • Removed Resize from list of layers that need padding in Neon.
  • In Neon and CL MUL workloads, use as convert policy SATURATE if one of the inputs is quantized and WRAP for the rest of cases.
  • Fixed non-channel per axis quantization.
  • Fixed compiler implicit copy deprecation warning by updating Quantization copy constructor.
  • PyArmNN has hard dependencies on all parsers when using cmake.
  • Fixed cxxopts and ghc cross compilation issue.
  • Fixed undefined reference to GetIdStatic() in DynamicBackendsTests.

Known Issues:

  • Using a comma separated list to specify multiple compute devices --compute CpuRef,CpuAcc when using ExecuteNetwork doesn't work. To use multiple compute devices use --compute CpuRef --compute CpuAcc.

TfLite Delegate:

New Features:

Current supported operators:

  • Activation (ReLu, Relu6, Logistic, and TanH)
  • Comparison (Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual)
  • Control (Concat and Mean)
  • Convolution (Convolution2d, DepthwiseConvolution2d and TransposeConvolution)
  • ElementWiseBinary (Add, Div, Max, Min, Mul, Sub)
  • ElementWiseUnary (Abs, Exp, Neg, Rsqrt, Sqrt )
  • FullyConnected
  • Pooling (MaxPool2d, AveragePool2d and L2Pool2d)
  • Quantization (Dequantize and Quantize)
  • Redefine (Reshape)
  • Resize (Bilinear and NearestNeightbour)
  • Softmax (Softmax and LogSoftmax)
  • Transpose

Other Changes:

  • Created the TfLite Delegate sub-directory in ArmNN.
  • Added Fp16 support.
  • Updated Tensorflow from v1.15 to v2.3.1.
  • Activated compiler warnings when building delegate.
  • Added ability to execute models through ExecuteNetwork using the TfLiteDelegate.

Known Issues:

Build dependencies:

Tools Version we support
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
boost 1.64
Tensorflow 2.3.1
Caffe tag 1.0
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Eigen3 3.3.
Android 10 and 11
Mali Driver r25p1_01bet0
Android NDK r20b
mapbox/variant 1.2.0
armnn - Release 20.08

Published by nikraj01 about 4 years ago

Summary

The 20.08 Release delivers the following:

  • The final tranche of support for Android R ahead of its release in September. Namely QoS functionality, Fill, Rank and the new Resize options.
  • Support for dynamic tensors where the size of any unspecified tensors can be inferred at network load time.
  • Performance enhancements on the NEON backend eliminating unnecessary copying of data in memory, namely:
    • The ability to directly import and export data into an inference graph.
    • The ability to use subtensors where possible in split and concat workloads.
  • Verification of support for TensorFlow Lite wav2letter and wav2letter tiny models (note: need to do further work to verify accuracy in the next release).

New Features:

  • Added FILL operator support for CpuRef, CpuAcc, and GpuAcc.
  • Added RANK operator support for CpuRef.
  • Added align corner and half pixels support to the RESIZE operator for CpuRef, CpuAcc, and GpuAcc.
  • Refactor TensorShape to support Dynamic Tensors (tensors of unknown dimension sizes or even unknown rank).
  • Enable memory import in CpuAcc.
  • Allow using Sub-Tensors on CpuAcc on ConcatenationLayer if concatenation is along x or y (2 innermost dimensions) and previous layers do not require padding.
  • Allow using Sub-Tensors on CpuAcc on SplitterLayer if split is along x or y (2 innermost dimensions) and next layers do not require padding.

TfLite Parser:

  • Added DIV operator support.
  • Added LEAKY_RELU operator support.
  • Added NEG operator support.
  • Added HARD_SWISH operator support.
  • Added Dynamic Tensors Type 1 (Output shape can be inferred from Input Shape, Input shape always has to be set, Output shape can be dynamic) Support.

Public API Changes:

  • Added ITensorHandleFactory::GetCapabilities to calculate capability of the TensorHandleFactor.

ExecuteNetwork App Changes:

  • Added -infer-output-shape option: if enabled it will enable ShapeInferenceMethod::InferAndValidate on TfLiteParser which supports dynamic tensors type 1 that Output shape can be inferred from Input shape.

Other changes:

  • Added EXP operator support to CpuAcc and GpuAcc.
  • Added ADD,SUB,DIV,MUL,MAXIMUM and MINIMUM int32 support in CpuRef.
  • Added PRELU float16 support in CpuRef.
  • Added ARGMINMAX float16 support in CpuRef.
  • Added GATHER support for any axis in CpuAcc and GpuAcc (previously the support was only for axis = 0).
  • Added LOGSOFTMAX support in CpuAcc and GpuAcc.
  • Added support for subtensors on Splitter layer for splitting x/y axis if no padding required on next layer.
  • Added support for subtensors on Concat layer for concatenating x/y axis if no padding required on previous layer.
  • Replace boost::filesystem by ghc::filesystem.
  • Remove boot/dll.hpp from dynamic backends test.
  • Separated external profiling server code into a standalone library.

Bug Fixes:

  • Added ability for Mean Reduction to reduce to scalar.
  • Added ability for Strided Slice to shrink to scalar.
  • Added a check for Strided Slice to not run when stride is negative and ShrinkAxisMask set.
  • Fix edge case for transposeConv2d output shape inference.
  • Fix deserializer output binding TensorShape logic.
  • Fixed issue where AddBroadcastReshapeLayer would always connect the Reshaped input to the first input slot and the other input to the first input slot.
  • Remove TfLite Concat and Pad quantazation validation.

Build dependencies

Tools Version we support
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
boost 1.64
Tensorflow TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0)
Caffe CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9
Onnx ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab
Flatbuffer 1.12.0
Protobuf 3.5.2
Eigen3 3.3
Android 9 and 10
Mali Driver r25p1_01bet0
Android NDK r20b
armnn - Release 20.05

Published by nikraj01 over 4 years ago

New Features:

  • Added comparison operators (EQUAL, NOT_EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL) support to CpuAcc and GpuAcc backends
  • Added EXP operator support to CpuAcc backend
  • Added NEG operator support to CpuAcc and GpuAcc backend
  • Added QLSTM operator partial support (projection not yet supported) to Reference backend
  • Added QLSTM operator full support to CpuAcc and GpuAcc backends
  • Added Boolean data type
  • Added QAsymmS8 to ArmnnQuantizer
  • Added QAsymmS8 data type
  • Added BFloat16 data type
  • Added BFloat16 support to Reference backend
    • Activation
    • Addition
    • ArgMinMax
    • BatchNormalization
    • BatchToSpaceNd
    • Comparison
    • Concat
    • Constant
    • Convolution2d
    • Debug
    • DepthToSpace
    • DepthwiseConvolution2d
    • DetectionPostProcess
    • Equal
    • Floor
    • FullyConnected
    • Gather
    • Input
    • InstanceNormalization
    • L2Normalization
    • LogSoftmax
    • Lstm
    • Maximum
    • Mean
    • MemCopy
    • MemImport
    • Merge
    • Minimum
    • Multiplication
    • Normalization
    • Output
    • Pad
    • Permute
    • Pooling2d
    • Quantize
    • Division
    • Prelu
    • Reshape
    • Resize
    • Slice
    • Softmax
    • SpaceToBatchNd
    • SpaceToDepth
    • Splitter
    • Stack
    • StandIn
    • StridedSlice
    • Subtraction
    • Switch
    • TransposeConvolution2d
    • Transpose
  • Added support for BFloat16 turbo mode

TfLite Parser:

  • Added support for STRIDED_SLICE operator
  • Added support for EXP operator
  • Added support for SPLIT_V operator
  • Enabled SPLIT along any dimension

Tf Parser:

  • Added support for PACK/STACK

ArmNN Serializer

  • Added QSymmS8 data type support Armnn Serializer Schema, ArmnnSchema.fbs
  • Added per-axis quantization parameters to ArmnnConverter (Serializer - Deserializer) tool

Public API Changes:

  • Added Activate and Deactivate timeline control packet to the External Profiling protocol

Backend API Changes:

  • User backend hint API to select preferred backend on a per layer basis

Bug Fixes:

  • Fixed segfault parsing reshape layer
  • Fixed ArmNN Compile Error when compiled against gcc 9
  • Fixed unit test errors when running on raspberry pi due to the fact that the size of thread::id is platform dependent
  • Fixed LSTM layer CellToInputWeights

Other changes:

  • Separated out BasePipeServer library from GatorDMock
  • Introduced polymorphic_downcast implementation
  • Introduced numeric_cast implementation
  • Introduced PolymorphicPointerDowncast implementation
  • Removed boost::ignore_unused
  • Removed boost::polymorphic_pointer_downcast
  • Removed boost::polymorphic_downcast
  • Eliminated space restriction in batch norm layer which was giving errors when loading a quantized model
  • Doxygen Beautification
    Integration of PyArmNN

Known issues:

Build Dependencies:

Tools Version we support
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
boost 1.64
Tensorflow TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0)
Caffe CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9
Onnx ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab
Flatbuffer 1.10.0
Protobuf 3.5.2
Eigen3 3.3
Android 9 and 10
Mali Driver r23
Android NDK r20b