neural-compressor | Python Ecosystem Directory

Bot releases are hidden (Show)

neural-compressor - Intel® Neural Compressor v2.5.1 Release Latest Release

Published by chensuyue 7 months ago

Improvement
Bug Fixes
Validated Configurations

Improvement

Improve WOQ AutoRound export (409231, 7ee721)
Adapt ITREX v1.4 release for example evaluate (9d7a05)
Update more supported LLM recipes (ce9b16)

Bug Fixes

Fix WOQ RTN supported layer checking condition (079177)
Fix in-place processing error in quant_weight function (92533a)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.15
ITEX 2.14.0
PyTorch/IPEX 2.2
ONNX Runtime 1.17

neural-compressor - Intel® Neural Compressor v2.5 Release

Published by chensuyue 7 months ago

Highlights
Features
Improvement
Productivity
Bug Fixes
External Contributes
Validated Configurations

Highlights

Integrated Weight-Only Quantization algorithm AutoRound and verified on Gaudi2, Intel CPU, NV GPU
Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the recipes

Features

[Quantization] Integrate Weight-Only Quantization algorithm AutoRound (5c7f33, dfd083, 9a7ddd, cf1de7)
[Quantization] Quantize weight with in-place mode in Weight-Only Quantization (deb1ed)
[Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 (49ab28)
[Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM (7a3671)

Improvement

[Quantization] SmoothQuant code structure refactor (a8d81c)
[Quantization] Optimize the workflow of parsing Keras model (b816d7)
[Quantization] Support static_groups options in GPTQ API (1c426a)
[Quantization] Update TEQ train dataloader (d1e994)
[Quantization] WeightOnlyLinear keeps self.weight after recover (2835bd)
[Quantization] Add version condition for IPEX prepare init (d96e14)
[Quantization] Enhance the ORT node name checking (f1597a)
[Pruning] Stop the tuning process early when enabling smooth quant (844a03)

Productivity

ORT LLM examples support latest optimum version (26b260)
Add coding style docs and recommended VS Code setting (c1f23c)
Adapt transformers 4.37 loading (6133f4)
Upgrade pre-commit checker for black/blacken-docs/ruff (7763ed)
Support CI summary in PR comments (d4bcdd))
Notebook example update to install latest INC & TF, add metric in fit (4239d3)

Bug Fixes

Fix QA IPEX example fp32 input issue (c4de19)
Update Conditions of Getting min-max during TF MatMul Requantize (d07175)
Fix TF saved_model issues (d8e60b)
Fix comparison of module_type and MulLinear (ba3aba)
Fix ORT calibration issue (cd6d24)
Fix ORT example bart export failure (b0dc0d)
Fix TF example accuracy diff during benchmark and quantization (5943ea)
Fix bugs for GPTQ exporting with static_groups (b4e37b)
Fix ORT quant issue caused by tensors having same name (0a20f3)
Fix Neural Solution SQL/CMD injection (14b7b0)
Fix the best qmodel recovery issue (f2d9b7)
Fix logger issue (83bc77)
Store token in protected file (c6f9cc)
Define the default SSL context (b08725)
Fix IPEX stats bug (5af383)
Fix ORT calibration for Dml EP (c58aea)
Fix wrong socket number retrieval for non-english system (5b2a88)
Fix trust remote for llm examples (2f2c9a)

External Contributes

Intel Mac support (21cfeb)
Add PTQ example for PyTorch CV Segment Anything Model (bd5e69)

Validated Configurations

Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
Python 3.8, 3.9, 3.10, 3.11
TensorFlow 2.13, 2.14, 2.15
ITEX 2.13.0, 2.14.0
PyTorch/IPEX 2.0, 2.1, 2.2
ONNX Runtime 1.15, 1.16, 1.17

neural-compressor - Intel® Neural Compressor v2.4.1 Release

Published by chensuyue 10 months ago

Improvement
Bug Fixes
Examples
Validated Configurations

Improvement

Narrow down the tuning space of SmoothQuant auto-tune (9600e1)
Support ONNXRT Weight-Only Quantization with different dtypes (5119fc)
Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant (4d26e3)

Bug Fixes

Fix SmoothQuant alpha-space generation (33ece9)
Fix inputs error for SmoothQuant example_inputs (39f63a)
Fix LLMs accuracy regression with IPEX 2.1.100 (3cb6d3)
Fix quantizable add ops detection on IPEX backend (4c004d)
Fix range step bug in ORTSmoothQuant (40275c)
Fix unit test bugs and update CI versions (6c78df, 835805)
Fix notebook issues (08221e)

Examples

Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization (f19cc9)
Add code-generaion evaluation for Weight-Only Quantization GPTQ (763440)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.14
ITEX 2.14.0.1
PyTorch/IPEX 2.1.0
ONNX Runtime 1.16.3

neural-compressor - Intel® Neural Compressor v2.4 Release

Published by chensuyue 10 months ago

Highlights
Features
Improvement
Productivity
Bug Fixes
Examples
Validated Configurations

Highlights

Supported layer-wise quantization for PyTorch RTN/GPTQ Weight-Only Quantization and ONNX Runtime W8A8 quantization.
Supported Weight-Only Quantization tuning for ONNX Runtime backend.
Supported GGML double quant on RTN/GPTQ Weight-Only Quantization with FW extension API
Supported SmoothQuant of Big Saved Model for TensorFlow Backend.

Features

[Quantization] Support GGML double quant in Weight-Only Quantization for RTN and GPTQ (05c15a)
[Quantization] Support Weight-Only Quantization tuning for ONNX Runtime backend (6d4ea5, 934ba0, 4fcfdf)
[Quantization] Support SmoothQuant block-wise alpha-tuning (ee6bc2)
[Quantization] Support SmoothQuant of Big Saved Model for TensorFlow Backend (3b2925, 4f2c35)
[Quantization] Support PyTorch layer-wise quantization for GPTQ (ee5450)
[Quantization] support PyTorch layer-wise quantization for RTN (ebd1e2)
[Quantization] Support ONNX Runtime layer-wise W8A8 quantization (6142e4, 5d33a5)
[Common] [Experimental] FW extension API implement (76b8b3, 8447d7, 258236)
[Quantization] [Experimental] FW extension API for PT backend support Weight-Only Quantization (915018, dc9328)
[Quantization] [Experimental] FW extension API for TF backend support Keras Quantization (2627d3)
[Quantization] IPEX 2.1 XPU (CPU+GPU) support (af0b50, cf847c)

Improvement

[Quantization] Add use_optimum_format for export_compressed_model in Weight-Only Quantization (5179da, 0a0644)
[Quantization] Enhance ONNX Runtime quantization with DirectML EP (db0fef, d13183, 098401, 6cad50)
[Quantization] Support restore ipex model from json (c3214c)
[Quantization] ONNX Runtime add attr to MatMulNBits (7057e3)
[Quantization] Increase SmoothQuant auto alpha running speed (173c18)
[Quantization] Add SmoothQuant alpha search space as a config argument (f9663d)
[Quantization] Add SmoothQuant weight_clipping as a default_on option (1f4aec)
[Quantization] Support SmoothQuant with MinMaxObserver (45b496)
[Quantization] Support Weight-Only Quantization with fp16 for PyTorch backend (d5cb56)
[Quantization] Support trace with dictionary type example_inputs (afe315)
[Quantization] Support falcon Weight-Only Quantization (595d3a)
[Common] Add deprecation decorator in experimental fold (aeb3ed)
[Common] Remove 1.x API dependency (ee617a)
[Mixed Precision] Support PyTorch eager mode BF16 MixedPrecision (3bfb76)

Productivity

Support quantization and benchmark on macOS (16d6a0)
Support ONNX Runtime 1.16.0 (d81732, 299af9, 753783)
Support TensorFlow new API for gnr-base (8160c7)

Bug Fixes

Fix GraphModule object has no attribute bias (7f53d1)
Fix ONNX model export issue (af0aea, eaa57f)
Add clip for ONNX Runtime SmoothQuant (cbb69b)
Fix SmoothQuant minmax observer init (b1db1c)
Fix SmoothQuant issue in get/set_module (dffcfe)
Align sparsity with block-wise masks in progressive pruning (fcdc29)

Examples

Support peft model with SmoothQuant (5e21b7)
Enable two ONNX Runtime examples table-transformer-detection (550cee), BEiT (7265df)

Validated Configurations

Centos 8.4 & Ubuntu 22.04 & Win10 & MacOS Ventura 13.5
Python 3.8, 3.9, 3.10, 3.11
TensorFlow 2.13, 2.14, 2.15
ITEX 1.2.0, 2.13.0.0, 2.14.0.1
PyTorch/IPEX 1.13.0+cpu, 2.0.1+cpu, 2.1.0
ONNX Runtime 1.14.1, 1.15.1, 1.16.3
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.3.2 Release

Published by chensuyue 11 months ago

Features
Bug Fixes

Features

Reduce memory consumption in ONNXRT adaptor (f64833)
Support MatMulFpQ4 for onnxruntime 1.16 (1beb43)
Support MatMulNBits for onnxruntime 1.17 (67a31b)

Bug Fixes

Update ITREX version in ONNXRT WOQ example and fix bugs in hf models (0ca51a)
Update ONNXRT WOQ example into llama-2-7b (7f2063)
Fix ONNXRT WOQ failed with None model_path (cbd0a4)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.13
ITEX 2.13
PyTorch/IPEX 2.0.1+cpu
ONNX Runtime 1.15.1
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.3.1 Release

Published by chensuyue about 1 year ago

Bug Fixes
Productivity

Bug Fixes

Fix PyTorch SmoothQuant for auto alpha (e9c14a, 35def7)
Fix PyTorch SmoothQuant calibration memory overhead (49e950)
Fix PyTorch SmoothQuant issue in get/set_module (Issue #1265)(6de9ce)
Support falcon Weight-Only Quantization (bf7b5c)
Remove Conv2d in Weight-Only Quantization adaptor white list (1a6526)
Fix TensorFlow ssd_resnet50_v1 Example for TF New API (c63fc5)

Productivity

Adapt Example for TensorFlow 2.14 AutoTrackable API Change (424cf3)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.13, 2.14
ITEX 2.13
PyTorch/IPEX 2.0.1+cpu
ONNX Runtime 1.15.1
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.3 Release

Published by chensuyue about 1 year ago

Highlights
Features
Improvement
Productivity
Bug Fixes
Examples
Validated Configurations

Highlights

Integrate Intel Neural Compressor into MSFT ONNX Runtime (#16288) and Olive (#411, #412, #469).
Supported low precision (INT4, NF4, FP4) and Weight-Only Quantization algorithms including RTN, AWQ, GPTQ and TEQ on ONNX Runtime and PyTorch for LLMs optimization.
Supported sparseGPT pruner (88adfc).
Supported quantization for ONNX Runtime DML EP and DNNL EP, and verified inference on Intel NPU (e.g., Meteor Lake) and Intel CPU (e.g., Sapphire Rapids).

Features

[Quantization] Support ONNX Runtime quantization and inference for DNNL EP (79be8b)
[Quantization] [Experimental] Support ONNX Runtime quantization and inference for DirectML EP (750bb9)
[Quantization] Support low precision and Weight-Only Quantization (WOQ) algorithms, including RTN (501440, 19ab16, 859315), AWQ (2562f2, 641d42),
GPTQ (b5ac3c, 6ba783) and TEQ (d2f995, 9ff7f0) for PyTorch
[Quantization] Support NF4 and FP4 data type for PyTorch Weight-Only Quantization (3d11b5)
[Quantization] Support low precision and Weight-Only Quantization algorithms, including RTN, AWQ and GPTQ for ONNX Runtime (da4c92)
[Quantization] Support layer-wise quantization (d9d1fc) and enable with SmoothQuant (ec9ae9)
[Pruning] Add sparseGPT pruner and refactor pruning class (88adfc)
[Pruning] Add Hyper-parameter Optimization algorithm for pruning (6613cf)
[Model Export] Support PT2ONNX dynamic quantization export (165532)

Improvement

[Common] Clean up dataloader usage in examples (1044d8,
a2931e, 447cc7)
[Common] Enhance ONNX Runtime backend check (4ce9de)
[Strategy] Add block-wise distributed fallback in basic strategy (ea309f)
[Strategy] Enhance strategy exit policy (d19b42)
[Quantization] Add WeightOnlyLinear for Weight-Only approach to allow low memory inference (00bbf8)
[Quantization] Support more ONNX Runtime direct INT8 ops (b9ce61)
[Quantization] Support TensorFlow per-channel MatMul quantization (cf5589)
[Quantization] Implement a new method to perform alpha auto-tuning in SmoothQuant (084eda)
[Quantization] Enhance ONNX SmoothQuant tuning structure (f0d51c)
[Quantization] Enhance PyTorch SmoothQuant tuning structure (81da40)
[Quantization] Update PyTorch examples dataloader to support transformers 4.31.x (59371f)
[Quantization] Enhance ONNX Runtime backend setting for GPU EP support (295535)
[Pruning] Refactor pruning (92d14d)
[Mixed Precision] Update the list of supported layers for Keras mix-precision (692c8b)
[Mixed Precision] Introduce quant_level into mixed precision (0dc6a9)

Productivity

[Ecosystem] MSFT Olive integrate SmoothQuant and 3 LLM examples (#411, #412, #469)
[Ecosystem] MSFT ONNX Runtime integrate SmoothQuant static quantization (#16288)
[Neural Insights] Support PyTorch FX inspect tensor and integrate with Neural Insights (775def, 74a785)
[Neural Insights] Add step-by-step diagnosis cases (99c3b0)
[Neural Solution] Resource management and user-facing API enhancement (fbba10)
[Auto CI] Integrate auto CI code scan bug fix tools (f77a2c, 06cc38)

Bug Fixes

Fix bugs in PyTorch SmoothQuant (0349b9, 8f3645)
Fix pytorch dataloader batch size issue (6a98d0)
Fix bugs for ONNX Runtime CUDA EP (a1b566, d1f315)
Fix bug in ONNX Runtime adapter where _rename_node function fails with model size > 2 GB (1f6b1a)
Fix ONNX Runtime diagnosis bug (f10e26)
Update Neural Solution example and fix grpc port issue (528868)
Fix the objective initialization issue (9d7546)
Fix reshape issue for bayesian strategy (77cb83)
Fix CVEs (d86922, 2bbfcd, fc71fa)

Examples

Add Weight-Only LLM examples for PyTorch (4b24be, 66f7c1, aa457a)
Add Weight-Only LLM examples for ONNX Runtime (10c133)
Enable 3 ONNX Runtime examples, CodeBert (5e584e), LayoutLMv2 FUNSD (5f0b17), Table Transformer (eb8a95)
Add ONNX Runtime LLM SmoothQuant example Llama-7B (7fbcf5)
Enable 2 TensorFlow examples, ViT (94df99), GraphSage (29ec82)
Add easy get started notebooks (d7b608, 6ee846)
Add multi-cards magnitude pruning use case (909618)
Unify ONNX Runtime prepare model scripts (5ecb13)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.7, 3.8, 3.9, 3.10, 3.11
TensorFlow 2.11, 2.12, 2.13
ITEX 1.1.0, 1.2.0, 2.13.0.0
PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
ONNX Runtime 1.13.1, 1.14.1, 1.15.1
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.2 Release

Published by chensuyue over 1 year ago

Highlights
Features
Improvement
Productivity
Bug Fixes
Examples
External Contributes

Highlights

Expanded SmoothQuant support on mainstream frameworks including PyTorch/IPEX, TensorFlow/ITEX, ONNX Runtime, and validated popular large language models (LLMs) such as GPT-J, LLaMA, OPT, BLOOM, Dolly, MPT, LaMini-LM and RedPajama-INCITE.
Innovated two productivity components Neural Solution for distributed quantization and Neural Insights for quantization accuracy debugging.
Successfully integrated Intel Neural Compressor into MSFT Olive (#157) and DeepSpeed (#3300).

Features

[Quantization] Support TensorFlow SmoothQuant (1f4127)
[Quantization] Support ITEX SmoothQuant (1f4127)
[Quantization] Support PyTorch FX SmoothQuant (6a39f6, 603811)
[Quantization] Support ONNX Runtime SmoothQuant (3df647, 1e1d70)
[Quantization] Support dictionary inputs for IPEX quantization (4ba233)
[Quantization] Enable calibration algorithm Entropy/KL & Percentile for ONNX Runtime (dae494)
[MixedPrecision] Support mixed precision op name/type dict option (a9c2cb)
[Strategy] Support block wise tuning (9c26ed)
[Strategy] Enable mse_v2 for ONNX Runtime (62122d)
[Pruning] Support retrain free sparse (d29aa0)
[Pruning] Support TensorFlow pruning with 2.x API (072c13)

Improvement

[Quantization] Enhance Keras functional model quantization with Keras model in, quantized Keras model out (699751)
[Quantization] Enhance MatMul and Gather quantization for ONNX Runtime (1f9c4f)
[Quantization] Add new recipe for ONNX Runtime NLP models (10d82c)
[MixedPrecision] Add more FP16 OPs support for ONNX Runtime (15d551)
[MixedPrecision] Add more BF16 OPs support for TensorFlow (369b9d)
[Pruning] Enhance multihead-attention slim (f3de50)
[Pruning] Enable progressive pruning in N:M pattern (483e80)
[Model Export] Refine PT2ONNX export (877adb)
Remove redundant classes for quantization, benchmark and mixed precision (c51096)

Productivity

[Neural Solution] Support multi-node distribute tuning model-level parallelism (ee049c)
[Neural Insights] Support quantization and benchmark diagnosis with GUI (5dc9ea, 3bde2e, 898344)
[Neural Coder] Migrate Neural Coder support into 2.x API (113ca1, e74a8a)
[Ecosystem] MSFT Olive integration (#157)
[Ecosystem] MSFT DeepSpeed integration (#3300)
Support ITEX 1.2 (5519e2)
Support Python 3.11 (6fa053)
Enhance documentations for mixed precision, diagnosis, dataloader, metric, etc.

Bug Fixes

Fix ONNX Runtime SmoothQuant issues (85c6a0, 1b26c0)
Fix bug in IPEX fallback (b4f9c7)
Fix ITEX quantize/dequantize before BN u8 issue (5519e2)
Fix example inputs issue for IPEX smoothquant (c8b753)
Fix IPEX mixed precision (d1e734)
Fix inspect tensor (8f5f5d)
Fix PyTorch model peleenet, 3dunet accuracy issue after migrate into 2.x API
Fix CVEs (04c482, efcd98, 6e9f7b, 7abe32)

Examples

Enable 4 ONNX Runtime examples, layoutlmv3, layoutlmft, deberta-v3, GPTJ-6B.
Enable 2 TensorFlow LLMs with SmoothQuant, facebook-opt-125m, gpt2-medium.

External Contributes

Add a mathematical check for SmoothQuant transform (5c04ac)
Fix mismatch absorb layers due to tracing and named modules for SmoothQuant (bccc89)
Fix trace issue when input is dictionary for SmoothQuant (6a3c64)
Allow dictionary model inputs for ONNX export (17b642)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.7, 3.8, 3.9, 3.10, 3.11
TensorFlow 2.10.0, 2.11.0, 2.12.0
ITEX 1.1.0, 1.2.0
PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
ONNX Runtime 1.13.1, 1.14.1, 1.15.0
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.1.1 Release

Published by chensuyue over 1 year ago

Bug Fixes
Examples

Bug Fixes

Fix calibration max value issue for SmoothQuant (commit b28bfd)
Fix exception for untraceable model during SmoothQuant (commit b28bfd)
Fix depthwise conv issue for SmoothQuant (commit 0e5942)
Fix Keras model mix precision convert issue (commit 997c57)

Examples

Add gpt-j alpha-tuning example (commit 3b7d28)
Migrate notebook example update to INC2.0 API (commit 54d2f5)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.8
TensorFlow 2.11.0
ITEX 1.1.0
PyTorch/IPEX 1.13.0+cpu
ONNX Runtime 1.13.1
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.1 Release

Published by chensuyue over 1 year ago

Highlights
Features
Improvement
Bug Fixes
Examples
Documentations

Highlights

Support and enhance SmoothQuant on popular large language models (LLMs) (e.g., BLOOM-176B, OPT-30B, GPT-J-6B, etc.)
Support native Keras model quantization (Keras model as input, and quantized Keras model as output)
Provide auto-tuning strategy to improve quantization productivity
Support model conversion from TensorFlow INT8 to ONNX INT8 model
Polish documentations to help the user be easier to get started

Features

[Quantization] Support SmoothQuant and verify with LLMs (commit cbb5cf) (commit 08e255) (commit 12c101)
[Quantization] Support Keras functional model quantization with Keras model in, quantized Keras model out (commit efd737)
[Strategy] Add auto quantization level as the default tuning process (commit cdfb99)
[Strategy] Integrate quantization recipes into tuning strategy (commit 44d176)
[Strategy] Extend the strategy capability for adding the new data type (commit d0059c)
[Strategy] Enable tuning strategy level multi-node distribute quantization (commit e1fe50)
[AMP] Support ONNX Runtime with FP16 (commit 108c24)
[Productivity] Export TensorFlow models into ONNX QDQ mode on both fp32 and int8 precision (commit 33a235)
[Productivity] Support PT/IPEX v2.0 (commit dbf138)
[Productivity] Support ONNX Runtime v1.14.1 (commit 146759)
[Productivity] GitHub IO docs support history versions

Improvement

Remove the dependency on experimental API (commit 6e10ef)
Enhance GUI diagnosis function on model graph and tensor histogram showing style (commit 9f0891)
Optimize memory usage for PyTorch adaptor (commit c295a7), ONNX adaptor (commit 8cbf2e), TensorFlow adaptor (commit ad0f1e), and tuning strategy (commit c49300) to support LLM
Refine ONNX Runtime QDQ quantization graph (commit c64a5b)
Enable ONNX model quantization with NVidia GPU TRT EP (commit ba42d0)
Improve code line coverage to 85%

Bug Fixes

Fix mix precision config setting (commit 4b71a8)
Fix multi-instance benchmark on Windows (commit 1f89aa)
Fix domain detection for large ONNX model (commit 70a566)

Examples

Migrate examples with INC v2.0 API
Enable LLMs (e.g., GPT-NeoX, T5 Large, BLOOM-176B, OPT-30B, GPT-J-6B, etc.)
Enable examples for Keras in Keras out (commit efd737)
Enable multi-node training examples on CPU (e.g., RN50 distillation, QAT, pruning examples)
Add 15+ Huggingface (HF) examples with ONNX Runtime backend and update quantized models into HF (commit a4228d)
Add 2 examples for PT2ONNX model export (commit 26db4a)

Documentations

Polish documentations with simplified GitHub main page, easy to read IO Docs structure, hands on API migrate user guide, more detailed new API instruction, refreshed API docs template, etc.

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.7, 3.8, 3.9, 3.10
TensorFlow 2.10.1, 2.11.0, 2.12.0
ITEX 1.0.0, 1.1.0
PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.0+cpu
ONNX Runtime 1.12.1, 1.13.1, 1.14.1
MXNet 1.9.1

neural-compressor - Intel® Neural Compressor v2.0 Release

Published by kevinintel almost 2 years ago

Highlights
Features
Bug Fixes
Examples
Documentations

Highlights

Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Features

[Quantization] Experimental Keras model in, quantized Keras model out (commit 4fa753)
[Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit a2fcb2)
[Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
[Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit 1deb7d)
[Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit 2483a8)
[Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit b7a2ef)
[AMP] Support more TensorFlow bf16 ops (commit 98d3c8)
[AMP] Add torch.amp bf16 support for IPEX backend (commit 2a361b)
[Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit 80311f) and HAWQ (commit 83018e) to solve the accuracy problem of specific models
[Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
[Pruning] Add progressive pruning and pattern lock pruning_type (commit f46bb1)
[Pruning] Add per_channel sparse pattern (commit f46bb1)
[Distillation] Support self-distillation towards efficient and compact neural networks (commit acdd4c)
[Distillation] Enhance API of intermediate layers knowledge distillation (commit 3183f6)
[Neural Coder] Detect devices and ISA to adjust the optimization (commit 691d0b)
[Neural Coder] Automatically quantize with ONNX Runtime backend (commit f711b4)
[Neural Coder] Add Neural Coder Python Launcher (commit 7bb92d)
[Neural Coder] Add Visual Studio Plugin (commit dd39ca)
[Productivity] Support Pruning in GUI (commit d24fea)
[Productivity] Use config-driven API to replace yaml
[Productivity] Export ONNX QLinear to QDQ format (commit e996a9)
[Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Bug Fixes

Fix quantization failed of Onnx models with over 2GB model size (commit 8d83cc)
Fix bf16 disabled by default (commit 83825a)
Fix PyTorch DLRM quantization out of memory (commit ff1725)
Fix ITEX resnetv2_50 tuning accuracy (commit ae1e05)
Fix bf16 ops error in QAT when torch version < 1.11 (commit eda8cb)
Fix the key comparison in the Bayesian strategy (commit 1e9c12)
Fix PyTorch T5 can’t do static quantization (commit ee3ef0)

Examples

Add quantization examples of HuggingFace models with OnnxRuntime backend (commit f4aeb5)
Add Big language model quantization example: GPT-J (commit 01899d)
Add Distributed Distillation examples: MobileNetV2 (commit d33ebe) and CNN-2 (commit ebe9e2)
Update examples with INC v2.0 new API
Add Stable Diffusion example

Documentations

Update the accuracy of broad hardware (commit 71b056)
Refine API helper and documents

Validated Configurations

Centos 8.4 & Ubuntu 20.04
Python 3.7, 3.8, 3.9, 3.10
TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
ONNX Runtime 1.11.0, 1.12.1, 1.13.1
MxNet 1.7.0, 1.8.0, 1.9.1

neural-compressor - Intel® Neural Compressor v1.14. 2 Release

Published by kevinintel almost 2 years ago

Highlights
Features
Bug Fixes
Examples

Highlights

We support experimental quantization support for ITEX v1.0 on Intel CPU and GPU, which is the first time to support the quantization on Intel GPU. We support hardware-neutral quantized ONNX models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime.

Features

Support quantization support on PyTorch v1.13 (commit 97c946)
Support experimental quantization support for ITEX v1.0 on Intel CPU and GPU (commit a2fcb2)
Support GUI on native Windows (commit fe9923)
Support INT8 model load and save API with IPEX backend (commit 23c585)

Bug Fixes

Fix GPT2 quantization failed with ONNX Runtime backend (commit aea121)

Examples

Support personalized Stable Diffusion with few-shot fine-tuning (commit 4247fd)
Add ITEX examples efficientnet_v2_b0, mobilenet_v1, mobilenet_v2, inception_resnet_v2, inception_v3, resnet101, resnet50, vgg16, xception, densenet121....etc. (commit 6ab557)
Validate quantized ONNX model on multiple devices (Intel CPU, NVIDIA GPU, AMD CPU, and ARM CPU) (commit 288340)

Validated Configurations

Centos 8.4
Python 3.8
TensorFlow 2.10, ITEX 1.0
PyTorch 1.12.0+cpu, 1.13.0+cpu, IPEX 1.12.0
ONNX Runtime 1.12
MxNet 1.9

neural-compressor - Intel® Neural Compressor v1.14.1 Release

Published by kevinintel about 2 years ago

Bug Fixes
Productivity
Examples

Bug Fixes

Fix name matching issue of scale and zero-point in PyTorch (commit fd7a53)
Fix incorrect output quantization mode of MatMul + Relu fusion in TensorFlow (commit 9b5293)

Productivity

Support Onnx model with Python3.10 (commit 2faf0b)
Using TensorFlow create_file_writer API to support histogram of Tensorboard (commit f34852)

Examples

Add NAS notebooks (commit 5f0adf)
Add Bert mini 2:4, 1x4 and mixed examples with new Pruning API (commit a52074)
Add keras in, saved_model out resnet101, inception_v3, mobilenetv2, xception, resnetv2 examples (commit fdd40e)

Validated Configurations

Python 3.7, 3.8, 3.9, 3.10
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.9, 2.10
Intel TensorFlow 2.7, 2.8, 2.9
PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
IPEX 1.10.0, 1.11.0, 1.12.0
MxNet 1.7, 1.9
ONNX Runtime 1.10, 1.11, 1.12

neural-compressor - Intel® Neural Compressor v1.14 Release

Published by kevinintel about 2 years ago

Highlights
New Features
Improvements
Bug Fixes
Productivity
Examples

Highlights
We are excited to announce the release of Intel® Neural Compressor v1.14! We release new Pruning API for PyTorch, allowing users select better combinations of criteria, pattern and scheduler to achieve better pruning accuracy. This release also supports Keras input for TensorFlow quantization, and self-distilled quantization for better quantization accuracy.

New Features

Pruning/Sparsity
- Support new structured sparse patterns N in M and NxM (commit 6cec70)
- Add pruning criteria snip and snip momentum (commit 6cec70)
- Add iterative pruning and decay types (commit 6cec70)
Quantization
- Support different Keras formats (h5, keras, keras saved model) as input and output of TensorFlow saved model (commit 5a6f09)
- Enable Distillation for Quantization (commit 03f1f3 & e20c76)
GUI
- Add mixed precision (commit 26e902)

Improvement

Enhance tuning for Quantization with IPEX 1.12 to remove additional Quant/DeQuant (commit 192100)
Add upstream and download API for HuggingFace model hub, which can handle configuration files, tokenizer files and int8 model weights in the format of transformers (commit 46d945)
Align with Intel PyTorch extension new API (commit cc368a)
Add load with yaml and pt to be compatible with older PyTorch model saving type (commit a28705)

Bug Fixes

Quantization
- Fix data type of ONNX Runtime quantization from fp64 to fp32 (commit cb7b48)
- Fix MXNET config issue with default config (commit b75ff2)
Export
- Fix export_to_onnx API (commit 158c7f)

Productivity

Support TensorFlow 2.10.0 (commit d6b6c9 & 8130e7)
Support OnnxRuntime 1.12 (commit 498ac4)
Export PyTorch QAT to Onnx (commit 029a63)
Add Tensorflow and PyTorch container tpp file (commit d245b5)

Examples

Add example of download from HuggingFace model hub and example of upstream models to the hub (commit 46d945)
Add notebooks for Neural Coder (commit 105db7)
Add 2 IPEX examples: bert_large (squad), distilbert_base (squad) (commit 192100)
ADD 2 DDP for prune once for all examples: roberta-base and Bert Base (commit 26a476)

Validated Configurations

Python 3.7, 3.8, 3.9, 3.10
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.9, 2.10
Intel TensorFlow 2.7, 2.8, 2.9
PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
IPEX 1.10.0, 1.11.0, 1.12.0
MxNet 1.7, 1.9
ONNX Runtime 1.10, 1.11, 1.12

neural-compressor - Intel® Neural Compressor v1.13.1 Release

Published by chensuyue about 2 years ago

Features

Support experimental auto-coding quantization for PyTorch
- Post-training static and dynamic quantization for PyTorch
- Post-training static quantization for IPEX
- Mixed-precision (BF16, INT8, and FP32) for PyTorch
Refactor quantization utilities for ONNX Runtime

Bug fix

Fixed model compression orchestration issue caused by PyTorch v1.11
Fixed GUI issues

Validated Configurations

Python 3.8
Centos 8.4
TensorFlow 2.9
Intel TensorFlow 2.9
PyTorch 1.12.0+cpu
IPEX 1.12.0
MXNet 1.7.0
ONNX Runtime 1.11.0

neural-compressor - Intel® Neural Compressor v1.13 Release

Published by ftian1 about 2 years ago

Features

Quantization
- Support new quantization APIs for Intel TensorFlow
- Support FakeQuant (QDQ) quantization format for ITEX
- Improve INT8 quantization recipes for ONNX Runtime
Mixed Precision
- Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
Neural Architecture Search
- Support SuperNet-based neural architecture search (DyNAS)
Sparsity
- Support training for block-wise structured sparsity
Strategy
- Support operator-type based tuning strategy

Productivity

Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry

Ecosystem

Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
Upstream 10 quantized Transformers-based models to HuggingFace Model Hub

Examples

Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
Add more quantization examples from TensorFlow Model Zoo

Validated Configurations

Python 3.8, 3.9, 3.10
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.7, 2.8, 2.9
Intel TensorFlow 2.7, 2.8, 2.9
PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
IPEX 1.10.0, 1.11.0, 1.12.0
MxNet 1.6.0, 1.7.0, 1.8.0
ONNX Runtime 1.9.0, 1.10.0, 1.11.0

neural-compressor - Intel® Neural Compressor v1.12 Release

Published by ftian1 over 2 years ago

Features

Quantization
- Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
- Improve post-training quantization (static & dynamic) on PyTorch
- Improve post-training quantization on TensorFlow
- Improve QLinear and QDQ quantization modes on ONNX Runtime
- Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
Pruning
- Improve pruning-once-for-all for NLP models
Sparsity
- Support experimental sparse kernel for reference examples

Productivity

Support model deployment by loading INT8 models directly from HuggingFace model hub
Improve GUI with optimized model downloading, performance profiling, etc.

Ecosystem

Highlight simple quantization usage with few clicks on ONNX Model Zoo
Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo

Examples

Add Bert-mini distillation + quantization notebook example
Add DLRM & SSD-ResNet34 quantization examples on IPEX
Improve BERT structured sparsity training example

Validated Configurations

Python 3.8, 3.9, 3.10
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.6.2, 2.7, 2.8
Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
IPEX 1.8.0, 1.9.0, 1.10.0
MxNet 1.6.0, 1.7.0, 1.8.0
ONNX Runtime 1.8.0, 1.9.0, 1.10.0

neural-compressor - Intel® Neural Compressor v1.11 Release

Published by ftian1 over 2 years ago

Features

Quantization
- Supported QDQ as experimental quantization format for ONNX Runtime
- Improved FX symbolic tracing for PyTorch
- Supported multi-metrics for quantization tuning
Knowledge distillation
- Improved distillation algorithm for intermediate layer knowledge transfer
Productivity
- Improved quantization productivity for ONNX Runtime through GUI
- Improved PyTorch INT8 model save/load methods
Ecosystem
- Upstreamed INC quantized Yolov3, DenseNet, Mask-Rcnn, Yolov4 models to ONNX Model Zoo
- Became PyTorch ecosystem tool shortly after published PyTorch INC tutorial
Examples
- Added INC quantized ResNet50 v1.5 and BERT-Large model for IPEX
- Supported dynamic quantization & weight sharing on bare metal reference engine

neural-compressor - Intel® Neural Compressor v1.10 Release

Published by ftian1 over 2 years ago

Features

Quantization
- Supported the quantization on latest deep learning frameworks
- Supported the quantization for a new model domain (Audio)
- Supported the compatible quantization recipes for framework upgrade
Pruning & Knowledge distillation
- Supported fine-tuning and quantization using INC & Optimum for “Prune Once for All: Sparse Pre-Trained Language Models” published at ENLSP NeurIPS Workshop 2021
Structured sparsity
- Proved the sparsity training recipes across multiple model domains (CV, NLP, and Recommendation System)

Productivity

Improved INC GUI for easy quantization
Supported Windows OS conda installation

Ecosystem

Upgraded INC v1.9 into HuggingFace Optimum
Upsteamed INC quantized mobilenet & faster-rcnn models to ONNX Model Zoo

Examples

Supported quantization on 300 random models
Added bare-metal examples for Bert-mini and DLRM

Validated Configurations

Python 3.7, 3.8, 3.9
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.6.2, 2.7, 2.8
Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
IPEX 1.8.0, 1.9.0, 1.10.0
MxNet 1.6.0, 1.7.0, 1.8.0
ONNX Runtime 1.8.0, 1.9.0, 1.10.0

Distribution:

	Channel	Links	Install Command
Source	Github	https://github.com/intel/neural-compressor.git	$ git clone https://github.com/intel/neural-compressor.git
Binary	Pip	https://pypi.org/project/neural-compressor	$ pip install neural-compressor
Binary	Conda	https://anaconda.org/intel/neural-compressor	$ conda install neural-compressor -c conda-forge -c intel

Contact:

Please feel free to contact [email protected], if you get any questions.

neural-compressor - Intel® Neural Compressor v1.9 Release

Published by ftian1 almost 3 years ago

Features

Knowledge distillation
- Supported one-shot compression pipelines (knowledge distillation during quantization-aware training) on PyTorch
- Added more distillation examples on TensorFlow and PyTorch
Quantization
- Supported multi-objective tuning for quantization
- Supported Intel Extension for PyTorch v1.10 version
- Improved quantization-aware training support on PyTorch v1.10
Pruning
- Added more magnitude pruning examples on TensorFlow
Reference bara-metal examples
- Supported BF16 optimizations on NLP models
- Added sparse DLRM model (experimental)
Productivity
- Added Python favorable API (alternative to YAML configuration file)
- Improved user facing APIs more pythonic
Ecosystem
- Integrated pruning API into HuggingFace Optimum
- Added ssd-mobilenetv1, efficientnet, ssd, fcn_rn50, inception_v1 quantized models to ONNX Model Zoo

Validated Configurations

Python 3.7 & 3.8 & 3.9
Centos 8.3 & Ubuntu 18.04
TensorFlow 2.6.2 & 2.7
Intel TensorFlow 2.4.0, 2.5.0 and 1.15.0 UP3
PyTorch 1.8.0+cpu, 1.9.0+cpu, IPEX 1.8.0
MxNet 1.6.0, 1.7.0, 1.8.0
ONNX Runtime 1.6.0, 1.7.0, 1.8.0

Distribution:

	Channel	Links	Install Command
Source	Github	https://github.com/intel/neural-compressor.git	$ git clone https://github.com/intel/neural-compressor.git
Binary	Pip	https://pypi.org/project/neural-compressor	$ pip install neural-compressor
Binary	Conda	https://anaconda.org/intel/neural-compressor	$ conda install neural-compressor -c conda-forge -c intel

Contact:

Please feel free to contact [email protected], if you get any questions.

Package Rankings

Top 1.78% on Pypi.org

Top 14.45% on Npmjs.org

Top 6.75% on Proxy.golang.org

Badges

Extracted from project README

Related Projects

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and A...

01 Jun 2023 2,379

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating poin...

20 Sep 2022 1,482

diffusers-torchao

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 ...

05 Aug 2024 166

litgpt

20+ high-performance LLM implementations with recipes to pretrain, finetune and deploy at scale.

04 May 2023 8,116

QAnything

Question and Answer based on Anything.

03 Jan 2024 9,888

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

07 Aug 2018 1,152

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-b...

03 May 2018 4,294

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

17 Nov 2022 1,199

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques fo...

11 Nov 2022 1,909

llmware

Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

29 Sep 2023 3,057

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by ...

21 Aug 2023 4,880

aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neu...

21 Apr 2020 1,937

AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantiz...

12 Jan 2024 797