optimum | PyTorch Ecosystem Directory

Bot releases are visible (Hide)

optimum - v1.4.1: Patch release

Published by echarlaix almost 2 years ago

Add inference with ORTModel to ORTTrainer and ORTSeq2SeqTrainer #189
Add InferenceSession options and provider to ORTModel #271
Add mT5 (#341) and Marian (#393) support to ORTOptimizer
Add batchnorm folding torch.fx transformations #348
The torch.fx transformations now use the marking methods mark_as_transformed, mark_as_restored, get_transformed_nodes #385
Update BaseConfig for transformers 4.22.0 release #386
Update ORTTrainer for transformers 4.22.1 release #388
Add extra ONNX Runtime quantization options #398
Add possibility to pass provider_options to ORTModel #401
Add support to pass a specific device for ORTModel, as transformers does for pipelines #427
Fixes to support onnxruntime 1.13.1 #430

optimum - v1.4.0: ORTQuantizer and ORTOptimizer refactorization

Published by echarlaix about 2 years ago

ONNX Runtime

Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
Add ONNX Runtime fused Adam Optimizer (#295)
Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

Torch FX

Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

Improvements and bugfixes

Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
Enable node exclusion from quantization for benchmark suite (#284)
Enable possibility to use a token authentication when loading a calibration dataset (#289)
Fix optimum pipeline when no model is given (#301)

optimum - v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification

Published by echarlaix over 2 years ago

Torch FX

The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.

The Transformation and ReversibleTransformation represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
The compose utility function enables transformation composition
Two reversible transformations were added:
- MergeLinears: merges linear layers that have the same input
- ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse

ORTModelForSeq2SeqLM

ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.

When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForSeq2SeqLM

# Load model from hub and export it through the ONNX format 
model = ORTModelForSeq2SeqLM.from_pretrained("t5-small",  from_transformers=True)

# Save the exported model in the given directory
model.save_pretrained(output_dir)

ORTModelForImageClassification

ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.

Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForImageClassification

# Load model from hub and export it through the ONNX format 
model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224",  from_transformers=True)

# Save the exported model in the given directory
model.save_pretrained(output_dir)

ORTOptimizer

Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).

Pipelines

Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

Image Classification (ViT)
Text-to-Text Generation (T5 small)
Summarization (T5 base)
Translation (T5 base)

Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)

text = "What a beautiful day !"
pred = onnx_translation(text)
# [{'translation_text': "C'est une belle journée !"}]

Breaking change

The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.

optimum - v1.2.3: Patch release

Published by echarlaix over 2 years ago

Remove intel sub-package, migrating to optimum-intel (#212)
Fix the loading and saving of ORTModel optimized and quantized models (#214)

optimum - v1.2.2: Patch release

Published by echarlaix over 2 years ago

Extend QuantizationPreprocessor to dynamic quantization (https://github.com/huggingface/optimum/pull/196)
Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
Bump huggingface_hub version and protobuf fix (https://github.com/huggingface/optimum/pull/205)

optimum - v1.2.1: Patch release

Published by echarlaix over 2 years ago

Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)

optimum - v1.2.0: pipeline and AutoModelForXxx classes to run ONNX Runtime inference

Published by echarlaix over 2 years ago

ORTModel

ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForSequenceClassification

# Load model from hub and export it through the ONNX format 
model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english", 
    from_transformers=True
)

# Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")

Pipelines

Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.

The currently supported tasks with the default model for each are the following :

Text Classification (DistilBERT model fine-tuned on SST-2)
Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
Token Classification(BERT large fine-tuned on CoNLL2003)
Feature Extraction (DistilBERT)
Zero Shot Classification (BART model fine-tuned on MNLI)
Text Generation (DistilGPT2)

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering

# load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

# test the model with using transformers pipeline, with handle_impossible_answer for squad_v2 
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
  question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)

print(prediction)
# {'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}

Improvements

Add loss when performing the evalutation step using an instance of ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152

optimum - v1.1.1: Patch release

Published by JingyaHuang over 2 years ago

Habana

Installation details added for Optimum-Habana which provides optimized transformers integration for Intel's Habana Gaudi Processor (HPU).

ONNX Runtime

Add the possibility to specify the execution provider in ORTModel.
Add IncludeFullyConnectedNodes class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
Update QuantizationPreprocessor so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
Rename Seq2SeqORTTrainer to ORTSeq2SeqTrainer for clarity and to keep consistency.
Add ORTOptimizer support for ELECTRA models.
Fix the loading of pretrained ORTConfig which contains optimization and quantization config.

optimum - v1.1.0: ORTTrainer, Seq2SeqORTTrainer, ONNX Runtime optimization and quantization API improvements

Published by echarlaix over 2 years ago

ORTTrainer and Seq2SeqORTTrainer

The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes.

Both ORTTrainer and Seq2SeqORTTrainer were created to have a similar user-facing API as the Trainer and Seq2SeqTrainer of the Transformers library.
ORTTrainer allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.
ORTTrainer allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.
For Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with --predict_with_generate, as the generate method is not supported yet.

ONNX Runtime optimization and quantization APIs improvements

The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the ORTQuantizer method partial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
The classes OptimizationConfig, QuantizationConfig and CalibrationConfig were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration ORTConfig.
The QuantizationPreprocessor class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.

optimum - v1.0.0: ONNX Runtime optimization and quantization support

Published by echarlaix over 2 years ago

ONNX Runtime support

An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

Additionnal features for Intel Neural Compressor

We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.

optimum - v0.1.2: Intel Neural Compressor's pruning support

Published by echarlaix over 2 years ago

With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of IncTrainer which handles the pruning process.

optimum - v0.1.1: Intel Neural Compressor's dynamic, post-training and aware-training quantization support

Published by echarlaix almost 3 years ago

With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the IncQuantizedModel class.

optimum - Optimum v0.0.1 - EAP

Published by mfuntowicz about 3 years ago

Initial release for early access to Optimum library featuring Intel's LPOT quantization and pruning support.

Package Rankings

Top 1.36% on Pypi.org

Top 21.48% on Conda-forge.org

Top 38.25% on Anaconda.org

Badges

Extracted from project README

Related Projects

video-transformers

Easiest way of fine-tuning HuggingFace video classification models

12 Aug 2022 131

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

31 Aug 2020 9,151

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

13 Apr 2023 4,292

Keras-OneClassAnomalyDetection

[5 FPS - 150 FPS] Learning Deep Features for One-Class Classification (AnomalyDetection). Corresp...

06 Jan 2019 127

Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT...

21 Mar 2023 1,143

text-generation-inference

Large Language Model Text Generation Inference

08 Oct 2022 7,916

ort

Accelerate PyTorch models with ONNX Runtime

08 Feb 2021 353

nlpboost

Python library for automatic training, optimization and comparison of Transformer models on most ...

28 Dec 2022 20

ao

torchao: PyTorch Architecture Optimization (AO). Performant kernels that work with PyTorch.

03 Nov 2023 193