Bot releases are visible (Hide)

optimum - v1.9.1: Patch release

Published by echarlaix over 1 year ago

Fix stable diffusion ONNX export for diffusers>=v0.18.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1173

Full Changelog: https://github.com/huggingface/optimum/compare/v1.9.0...v1.9.1

optimum - v1.9: extended ONNX, ONNX Runtime support

Published by fxmarty over 1 year ago

Improved memory management in the ONNX export

Lower memory usage during the ONNX export. This is especially useful to export large models, or on cuda device. Until PyTorch 2.1 release, we recommend to use PyTorch nightly in case memory issues are encountered, as two major bugs were fixed on PyTorch side: https://github.com/pytorch/pytorch/pull/101134 https://github.com/pytorch/pytorch/pull/101148

Run validation of exported model in no_grad mode by @fxmarty in https://github.com/huggingface/optimum/pull/1111
Load model directly on cuda device for the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1112
Lower GPU memory requirements at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1115

Extended ONNX export

The ONNX export now supports the sam, lilt, pix2struct, cvt and owlvit architectures.

Sam ONNX export support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1025
Add onnx exporter for Lilt model by @mariababich in https://github.com/huggingface/optimum/pull/1098
Add pix2struct to ONNX support (v2) by @arvisioncode in https://github.com/huggingface/optimum/pull/1034
Add CvTONNX Config by @rishabbala in https://github.com/huggingface/optimum/pull/1131
Support document-question-answering ONNX export for vision-encoder-decoder by @fxmarty in https://github.com/huggingface/optimum/pull/1110
add owlvit by @darwinharianto in https://github.com/huggingface/optimum/pull/1067

Support of custom ONNX configurations for export

The method main_export now supports two arguments model_kwargs and custom_onnx_configs that allow for a more custom export for advanced users. Reference.

[ONNX export] Ability to pass arbitrary kwargs, custom ONNX configs by @fxmarty in https://github.com/huggingface/optimum/pull/1143

Extended BetterTransformer support

Add blip-2 to bettertransformer by @baskrahmer in https://github.com/huggingface/optimum/pull/1125
Support llama bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/998

ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider

IO Binding is useful not only to avoid RAM/device memory copies, but also simply between numpy tensors and OrtValue. Thus, for autoregressive tasks we enable IO Binding as a default on CPUExecutionProvider as well, which may bring >10% speedup for large context lengths.

Enable use_io_binding = True on CPU by @yihonglyu in https://github.com/huggingface/optimum/pull/1087

ORTModelForSpeechSeq2Seq supported in ORTOptimizer

added ORTModelForSpeechSeq2Seq support to optimizer by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1068

Major bugfixes

Use mask for seq2seq ONNX decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/1076

What's Changed

Fix protobuf max allowed size by @fxmarty in https://github.com/huggingface/optimum/pull/988
Add Whisper to ORT optimizer configuration by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/986
Fix sentence-similarity task in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/996
Simplify auto task detection by @fxmarty in https://github.com/huggingface/optimum/pull/997
Fix merged decoder usage with fp16 by @fxmarty in https://github.com/huggingface/optimum/pull/1006
Fix past key value generator used for ONNX export validation for t5/mt5 by @fxmarty in https://github.com/huggingface/optimum/pull/1007
Fix typo for custom shapes passed at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/1008
Fix _versions.yml upload in doc build by @regisss in https://github.com/huggingface/optimum/pull/1003
ORTQuantizer supports subgraphs by @fxmarty in https://github.com/huggingface/optimum/pull/1009
fix for huggingface_hub last release by @echarlaix in https://github.com/huggingface/optimum/pull/1014
Add links to documentation to README by @echarlaix in https://github.com/huggingface/optimum/pull/1013
Upate documentation by @echarlaix in https://github.com/huggingface/optimum/pull/1011
update optimum intel description by @echarlaix in https://github.com/huggingface/optimum/pull/1015
fix: ValueError offload_dir by @orangetin in https://github.com/huggingface/optimum/pull/993
Sentence transformers ONNX export fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1029
Add OpenVINO notebooks by @echarlaix in https://github.com/huggingface/optimum/pull/1030
Fix task inference for sam by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1031
fix typo by @echarlaix in https://github.com/huggingface/optimum/pull/1033
added types to new fields in OptimizationConfig by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1036
Fix some typos in the quantization guide by @dcferreira in https://github.com/huggingface/optimum/pull/1041
Optional attention_mask in ORTModelForxxx by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1045
ONNX SAM export - change input_points data type by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1048
masked-im output name fix for transformers >= 4.29.0 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1049
remove torchvision requirement by @BramVanroy in https://github.com/huggingface/optimum/pull/1052
Update version by @regisss in https://github.com/huggingface/optimum/pull/1058
Bump package version by @regisss in https://github.com/huggingface/optimum/pull/1062
Raise MinimumVersionError when OnnxConfig.MIN_TORCH_VERSION is not satisfied by @regisss in https://github.com/huggingface/optimum/pull/1070
Remove deprecated argument from tests and examples by @echarlaix in https://github.com/huggingface/optimum/pull/1072
Detect model type for all transformers models in TasksManager by @fxmarty in https://github.com/huggingface/optimum/pull/1075
Fix HF Push to hub by @JingyaHuang in https://github.com/huggingface/optimum/pull/1080
Fix float16 ORT conversion for models > 2GB by @fxmarty in https://github.com/huggingface/optimum/pull/1079
Update doc workflows by @regisss in https://github.com/huggingface/optimum/pull/1093
Error out on ORTQuantizer.quantize call for static quantization when no calibration range is provided by @fxmarty in https://github.com/huggingface/optimum/pull/1094
Add mpt model_type to NormalizedTextConfig by @changwangss in https://github.com/huggingface/optimum/pull/1101
Fix doc build by @regisss in https://github.com/huggingface/optimum/pull/1107
Improve the offline support for the ONNX/TFLite export by @fxmarty in https://github.com/huggingface/optimum/pull/1109
Add ViT to ORTConfigManager by @baskrahmer in https://github.com/huggingface/optimum/pull/1117
Fix TasksManager get_model_from_task with None device by @fxmarty in https://github.com/huggingface/optimum/pull/1122
Small typos by @baskrahmer in https://github.com/huggingface/optimum/pull/1124
Refactor BetterTransformerManager requirement validation methods by @baskrahmer in https://github.com/huggingface/optimum/pull/1132
update the default block size by @rui-ren in https://github.com/huggingface/optimum/pull/1137
Update ORT training docker to 1.15 by @JingyaHuang in https://github.com/huggingface/optimum/pull/1139
Adamlouly/fix unwrap model eval by @AdamLouly in https://github.com/huggingface/optimum/pull/1099
Remove version pinning for onnx package by @cody-moveworks in https://github.com/huggingface/optimum/pull/1141

New Contributors

@orangetin made their first contribution in https://github.com/huggingface/optimum/pull/993
@dcferreira made their first contribution in https://github.com/huggingface/optimum/pull/1041
@BramVanroy made their first contribution in https://github.com/huggingface/optimum/pull/1052
@darwinharianto made their first contribution in https://github.com/huggingface/optimum/pull/1067
@mariababich made their first contribution in https://github.com/huggingface/optimum/pull/1098
@changwangss made their first contribution in https://github.com/huggingface/optimum/pull/1101
@arvisioncode made their first contribution in https://github.com/huggingface/optimum/pull/1034
@yihonglyu made their first contribution in https://github.com/huggingface/optimum/pull/1087
@rui-ren made their first contribution in https://github.com/huggingface/optimum/pull/1137
@cody-moveworks made their first contribution in https://github.com/huggingface/optimum/pull/1141
@rishabbala made their first contribution in https://github.com/huggingface/optimum/pull/1131

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.0...v1.9.0

optimum - v1.8.8: Patch release

Published by echarlaix over 1 year ago

Fix optimum model inference compatibility with transformers>=v4.30.0 by @echarlaix in https://github.com/huggingface/optimum/pull/1102
Fix stable diffusion ONNX export following diffusers breaking change by @fxmarty in https://github.com/huggingface/optimum/pull/1116

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.7...v1.8.8

optimum - v1.8.7: Patch release

Published by echarlaix over 1 year ago

Restrict transformers version by @echarlaix in https://github.com/huggingface/optimum/pull/1097

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.6...v1.8.7

optimum - v1.8.6: Patch release

Published by regisss over 1 year ago

Fix CLI for exporting models to TFLite by @regisss #1059

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.5...v1.8.6

optimum - v1.8.5: Patch release

Published by regisss over 1 year ago

Add transformers<4.29.0 in Habana extra by @regisss in #1047

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.4...v1.8.5

optimum - v1.8.4: Patch release

Published by echarlaix over 1 year ago

Set onnx requirement by @echarlaix @regisss in https://github.com/huggingface/optimum/pull/1037

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.3...v1.8.4

optimum - v1.8.3: Patch release

Published by echarlaix over 1 year ago

Fix Stable Diffusion model ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/1020
Add optimum-neuron extra by @michaelbenayoun in https://github.com/huggingface/optimum/pull/1021

Full Changelog: https://github.com/huggingface/optimum/compare/v1.8.2...v1.8.3

optimum - v1.8: extended BetterTransformer support, ONNX merged seq2seq models

Published by fxmarty over 1 year ago

Extended BetterTransformer support

Various improvements in the PyTorch BetterTransformer integration.

[BT] add BetterTransformer support for ProphetNet by @hirotasoshu in https://github.com/huggingface/optimum/pull/923
Improve bettertransformer benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/939
Fix sdpa with batch size = 1, better benchmark by @fxmarty in https://github.com/huggingface/optimum/pull/915
Fix slow tests & sdpa dropout by @fxmarty in https://github.com/huggingface/optimum/pull/974
Remove getattr overhead in spda by @fxmarty in https://github.com/huggingface/optimum/pull/934
[BT] Improve docs by @younesbelkada in https://github.com/huggingface/optimum/pull/944

ONNX merged seq2seq models

Instead of using two separate decoder_model.onnx and decoder_with_past_model.onnx models, a single decoder can be used for encoder-decoder models: decoder_model_merged.onnx. This allows to avoid duplicated weights in the two without/with past ONNX models.

By default, if available, the decoder_model_merged.onnx will be used in the ORTModel integration. This can be disabled with the option --no-post-process in the ONNX export CLI, and with use_merged=False in the ORTModel.from_pretrained method.

Example:

optimum-cli export onnx --model t5-small t5_onnx

will give:

└── t5_onnx
    ├── config.json
    ├── decoder_model_merged.onnx
    ├── decoder_model.onnx
    ├── decoder_with_past_model.onnx
    ├── encoder_model.onnx
    ├── generation_config.json
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

And decoder_model_merged.onnx is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.

Fix encoder-decoder ONNX merge by @fxmarty in https://github.com/huggingface/optimum/pull/924
Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/926
Support merged seq2seq models in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/930

New models in the ONNX export

Add llama onnx export & onnxruntime support by @nenkoru in https://github.com/huggingface/optimum/pull/975

Major bugfix

Remove constant output in encoder-decoder ONNX models decoder with past by @fxmarty in https://github.com/huggingface/optimum/pull/920
Hash tensor data during deduplication by @VikParuchuri in https://github.com/huggingface/optimum/pull/932

Potentially breaking changes

The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in transformers metadata:

sequence-classification becomes text-classification,
causal-lm becomes text-generation,
seq2seq-lm becomes text2text-generation,
speech2seq-lm and audio-ctc becomes automatic-speech-recognition,
default becomes feature-extraction,
masked-lm becomes fill-mask,
vision2seq-lm becomes image-to-text

This should not break anything except if you rely on private methods and attributes from TasksManager.

Allow to use a custom class in TasksManager & use canonical tasks names by @fxmarty in https://github.com/huggingface/optimum/pull/967

What's Changed

Update ort trainer to transformers 4.27.2 by @JingyaHuang in https://github.com/huggingface/optimum/pull/917
Compute Loss inside the training step. by @AdamLouly in https://github.com/huggingface/optimum/pull/686
Fix ORTModel MRO for whisper by @fxmarty in https://github.com/huggingface/optimum/pull/919
add ORTStableDiffusionPipeline reference in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/890
Fix decoder ONNX model loading from the Hub by @fxmarty in https://github.com/huggingface/optimum/pull/929
optimun-cli onnxruntime quantize / optimize output argument is now required by @michaelbenayoun in https://github.com/huggingface/optimum/pull/927
Register mechanism for the Optimum CLI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/928
Ensure backward compatibility of ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/933
Update the README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/925
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/941
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/942
Remove GC from README by @michaelbenayoun in https://github.com/huggingface/optimum/pull/943
Add user and token for CI by @michaelbenayoun in https://github.com/huggingface/optimum/pull/945
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/946
optimum-cli print the help of subcommands by @michaelbenayoun in https://github.com/huggingface/optimum/pull/940
Remove from_transformers references from the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/935
Turn command import into optional by @JingyaHuang in https://github.com/huggingface/optimum/pull/936
Auto-set use_merged to False if use_cache is passed as False by @fxmarty in https://github.com/huggingface/optimum/pull/954
Raise error with use_cache=False, use_io_binding=True by @fxmarty in https://github.com/huggingface/optimum/pull/955
Add an ORT training notebook by @JingyaHuang in https://github.com/huggingface/optimum/pull/959
Fix issue with doc build sometimes failing silently in GH workflows by @regisss in https://github.com/huggingface/optimum/pull/960
Fix typos by @regisss in https://github.com/huggingface/optimum/pull/963
Disable tests upon transformers 4.28 release by @fxmarty in https://github.com/huggingface/optimum/pull/976

New Contributors

@hirotasoshu made their first contribution in https://github.com/huggingface/optimum/pull/923
@VikParuchuri made their first contribution in https://github.com/huggingface/optimum/pull/932

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.3...v1.8.2

optimum - v1.7.3: Patch release for PyTorch 2.0 and transformers 4.27.0

Published by fxmarty over 1 year ago

This patch releases fixes a few bugs with PyTorch 2.0 release, and include a few new features as well.

Breaking change: constant outputs removed from ONNX encoder-decoder models

We removed some constant past key values outputs from encoder-decoder models in the ONNX export. Beware that this could potentially break your existing code, but we recommend to use the new exported models as this removes unnecessary Identity nodes in the models.

Remove constant outputs from decoder with past ONNX model for encoder-decoder architectures by @fxmarty in https://github.com/huggingface/optimum/pull/872

`torch.nn.functional.scaled_dot_product_attention` support for decoders in BetterTransformer

Pytorch 2.0 introduces in beta torch.nn.functional.scaled_dot_product_attention, a fastpath for attention extending their accelerated transformer features. This is included in optimum.bettertransformer to be used with the following architectures: Bart, Blenderbot, GPT2, GTP-J, M2M100, Marian, Mbart, OPT, Pegasus, T5.

Beware that this is still experimental and speedups have yet to be validated on all architectures.

PyTorch's scaled_dot_product_attention allows to use flash attention and memory efficient attention natively in PyTorch.

Usage is as follow:

from transformers import AutoTokenizer, AutoModelForCausalLM
from optimum.bettertransformer import BetterTransformer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

model = BetterTransformer.transform(model)  # modify transformers modeling to use native scaled_dot_product_attention

# do you inference or training here

model = BetterTransformer.reverse(model)  # go back to using canonical transformers modeling
model.save_pretrained("gpt2_model")

Inference benchmark (on fp16):

Model	batch size	Input sequence length	Generated tokens	Latency eager (s)	Latency BT (s)	Speedup	Peak memory eager (MB)	Peak memory BT (MB)	Memory savings
gpt2	1	64	256	1.800	1.607	12.0%	569.90	569.89	0%
gpt2	64	64	256	2.159	1.617	33.5%	2067.45	2093.80	0%
opt-1.3b	1	64	256	3.010	2.667	12.9%	5408.238	5408.238	0%
gpt-neox-20b	1	64	256	10.869	9.937	9.4%	83670.67	83673.53	0%

Training benchmark (on fp16):

Model	batch size	Sequence length	time/epoch (eager, s)	time/epoch (BT, s)	Speedup	Peak memory eager (MB)	Peak memory BT (MB)	Memory savings
gpt2	8	1024	17.732	14.037	26.3%	13291.16	10191.52	30.4%
gpt2	32	1024	17.336	13.309	30.3%	52834.83	38858.56	36.0%
gpt2	64	1024	OOM	14.067	/	OOM	75600.08	/

Benchmarks can be reproduced using the inference script and training script:

python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256
python benchmark_bettertransformer.py --model-name gpt2 --use-half --use-cuda --is_decoder --num-batches 5 --max_token 256 --seqlen-stdev 0

Add scaled_dot_product_attention support for decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/853
Support scaled_dot_product_attention for t5 by @fxmarty in https://github.com/huggingface/optimum/pull/856
[BT] add decoder benchmark script by @younesbelkada in https://github.com/huggingface/optimum/pull/857
[BT] Fix bt benchmark by @younesbelkada in https://github.com/huggingface/optimum/pull/858
Fix pytorch version check in bettertransformer by @fxmarty in https://github.com/huggingface/optimum/pull/862
[BT] Add fp16 support by @younesbelkada in https://github.com/huggingface/optimum/pull/859
[BT] Add decoder training support by @younesbelkada in https://github.com/huggingface/optimum/pull/860
Bart support scaled_dot_product_attention by @fxmarty in https://github.com/huggingface/optimum/pull/863
[BT] add accelerate_test markers by @younesbelkada in https://github.com/huggingface/optimum/pull/864
Mbart, pegasus, blenderbot, marian, m2m_100 support scaled_dot_product_attention by @fxmarty in https://github.com/huggingface/optimum/pull/865
Add bettertransformer reverse transform by @fxmarty in https://github.com/huggingface/optimum/pull/868
Add bettertransformer training benchmark script by @fxmarty in https://github.com/huggingface/optimum/pull/873

New architectures in the ONNX export

Three additional architectures are supported in the ONNX export: ImageGPT, RegNet, OPT.

Adding ONNX support for ImageGPT by @adit299 in https://github.com/huggingface/optimum/pull/819
Add ONNX support for RegNet by @asrimanth in https://github.com/huggingface/optimum/pull/833
Adding support for Facebook's OPT models by @hivaze in https://github.com/huggingface/optimum/pull/852

(WIP) TFLite export with quantization support

Continued progress in the TFLite export with quantization support. This is work in progress and not documented yet.

Quantization with TFLite by @michaelbenayoun in https://github.com/huggingface/optimum/pull/854

Bugfixes and improvements

Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/843
Fix typo in documentation by @regisss in https://github.com/huggingface/optimum/pull/848
Remove redundant code by @mht-sharma in https://github.com/huggingface/optimum/pull/841
Update README by @echarlaix in https://github.com/huggingface/optimum/pull/850
Update documentation by @echarlaix in https://github.com/huggingface/optimum/pull/855
Remove iobinding ORTModelForCTC by @mht-sharma in https://github.com/huggingface/optimum/pull/840
Fix typo in documentation by @echarlaix in https://github.com/huggingface/optimum/pull/861
Fix causal-lm ONNX axis names by @fxmarty in https://github.com/huggingface/optimum/pull/871
add NNCF openvino notebook by @echarlaix in https://github.com/huggingface/optimum/pull/875
Remove positional-only parameters not support by python < v3.8 by @echarlaix in https://github.com/huggingface/optimum/pull/881
lazy import for task manager by @JingyaHuang in https://github.com/huggingface/optimum/pull/844
Remove onnx and ort dependencies on the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/846
Reactivate export & optimization tests for causal-lm models by @fxmarty in https://github.com/huggingface/optimum/pull/885
Fix ONNX export on transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/884
Do not use scaled_dot_product_attention for stable diffusion onnx export by @fxmarty in https://github.com/huggingface/optimum/pull/888
Fix loading of an ONNX stable diffusion model when config doesn't match by @echarlaix in https://github.com/huggingface/optimum/pull/887
Automatic framework detection in TasksManager for large models by @fxmarty in https://github.com/huggingface/optimum/pull/883
Fix WavLM onnx export upon torch 2.0 release by @fxmarty in https://github.com/huggingface/optimum/pull/889
Fix PushToHubMixin._create_repo according to transformers 4.27 release by @fxmarty in https://github.com/huggingface/optimum/pull/892
Fix stable diffusion framework detection by @fxmarty in https://github.com/huggingface/optimum/pull/893
Add donut CPU inference ORT by @mht-sharma in https://github.com/huggingface/optimum/pull/761
Fix check_model for large merged ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/896
Drop python 3.7 support by @fxmarty in https://github.com/huggingface/optimum/pull/891
Fix dummy label generator for vision tasks by @JingyaHuang in https://github.com/huggingface/optimum/pull/900
Add stable diffusion dummy object by @echarlaix in https://github.com/huggingface/optimum/pull/899
Automatic support for large ONNX models in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/886
Remove subprocess calls in ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/897
Registering mechanism for the TasksManager by @michaelbenayoun in https://github.com/huggingface/optimum/pull/898
add option to run inference with ort by @prathikr in https://github.com/huggingface/optimum/pull/838
Check min diffusers version by @echarlaix in https://github.com/huggingface/optimum/pull/902
Update bug-report.yml by @lewtun in https://github.com/huggingface/optimum/pull/895
Fix axis name for seq2seq ONNX models by @fxmarty in https://github.com/huggingface/optimum/pull/904
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/909
Fix misleading error message in ORTOptimizer by @fxmarty in https://github.com/huggingface/optimum/pull/910
Delete all Docker images before building the doc of Optimum by @regisss in https://github.com/huggingface/optimum/pull/911
Fix onnx export preprocessors save by @fxmarty in https://github.com/huggingface/optimum/pull/913
Fix GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/914

New Contributors

@adit299 made their first contribution in https://github.com/huggingface/optimum/pull/819
@asrimanth made their first contribution in https://github.com/huggingface/optimum/pull/833
@hivaze made their first contribution in https://github.com/huggingface/optimum/pull/852

Full Changelog: https://github.com/huggingface/optimum/compare/v1.2.0...v1.7.2

optimum - v1.7.1: Patch release

Published by fxmarty over 1 year ago

Temporarily fix a critical bug in BetterTransformer https://github.com/huggingface/optimum/pull/849

Full Changelog: https://github.com/huggingface/optimum/compare/v1.7.0...v1.7.1

optimum - v1.7.0: ONNX export extension, TFLite export, single-ONNX decoding, ONNX Runtime extension for audio, vision tasks, stable diffusion

Published by fxmarty over 1 year ago

New models supported in the ONNX export

Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.

Add PoolFormer support in exporters.onnx by @BakingBrains in https://github.com/huggingface/optimum/pull/646
Support pegasus exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/620
Audio models support with optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/622
Add MPNet ONNX export by @jplu in https://github.com/huggingface/optimum/pull/691
Add stable diffusion VAE encoder export by @echarlaix in https://github.com/huggingface/optimum/pull/705
Add vision encoder decoder model in exporters by @mht-sharma in https://github.com/huggingface/optimum/pull/588
Nystromformer ONNX export by @whr778 in https://github.com/huggingface/optimum/pull/728
Support Splinter exporters (#555) by @Allanbeddouk in https://github.com/huggingface/optimum/pull/736
Add gpt-neo-x support by @sidthekidder in https://github.com/huggingface/optimum/pull/745

New models supported in BetterTransformer

A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian

Add RoCBert support for Bettertransformer by @shogohida in https://github.com/huggingface/optimum/pull/542
Add better transformer support for RoFormer by @manish-p-gupta in https://github.com/huggingface/optimum/pull/680
added BetterTransformer support for Marian by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/808

Additional tasks supported in the ONNX Runtime integration

With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.

Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models

Add ORTModelForMaskedLM class by @JingyaHuang in https://github.com/huggingface/optimum/pull/729
Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by @mht-sharma in https://github.com/huggingface/optimum/pull/742
Add ORTModelXXX for audio by @mht-sharma in https://github.com/huggingface/optimum/pull/774
Add stable diffusion onnx runtime pipeline by @echarlaix in https://github.com/huggingface/optimum/pull/786

Support of the ONNX export from PyTorch on float16

In the ONNX export, it is possible to pass the options --fp16 --device cuda to export using float16 when a GPU is available, directly with the native torch.onnx.export.

Example: optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/

Support ONNX export on torch.float16 type by @fxmarty in https://github.com/huggingface/optimum/pull/749

TFLite export

TFLite export is now supported, with static shapes:

optimum-cli export tflite --help
optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/

exporters.tflite initial support by @michaelbenayoun in https://github.com/huggingface/optimum/pull/716
TFLite auto-encoder models by @michaelbenayoun in https://github.com/huggingface/optimum/pull/757
[TFLite Export] Adds support for ResNet by @sayakpaul in https://github.com/huggingface/optimum/pull/813

ONNX Runtime optimization and quantization directly in the CLI

Add optimize and quantize command CLI by @jplu in https://github.com/huggingface/optimum/pull/700
Support ONNX Runtime optimizations in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/807

The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the --optimize O1, up to --optimize O4 option:

optimum-cli export onnx --help
optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/

ONNX Runtime quantization is supported directly in command line, using optimum-cli onnxruntime quantize:

optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512

ONNX Runtime optimization is supported directly in command line, using optimum-cli onnxruntime optimize:

optimum-cli onnxruntime optimize --help
optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3

ORTModelForCausalLM supports decoding with a single ONNX

Up no now, for decoders, two ONNX were used:

One handling the first forward pass where no past key values have been cached yet - thus not taking them as input.
One handling the following forward pass where past key values have been cached, thus taking them as input.

This release introduces the support in the ONNX export and in ORTModelForCausalLM of a single ONNX handling both steps of the decoding. This allows to reduce memory usage, as weights are not duplicated between two separate models during inference.

Using a single ONNX for decoders can be used by passing use_merged=True to ORTModelForCausalLM.from_pretrained, loading directly from a PyTorch model:

from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained("gpt2", export=True, use_merged=True)

Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with ORTModelForCausalLM, the command optimum-cli export onnx --model gpt2 gpt2_onnx/ will produce:

└── gpt2_onnx
    ├── config.json
    ├── decoder_model_merged.onnx
    ├── decoder_model.onnx
    ├── decoder_with_past_model.onnx
    ├── merges.txt
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    └── vocab.json

The decoder_model.onnx and decoder_with_past_model.onnx are kept separate for backward compatibility, but during inference using solely decoder_model_merged.onnx is enough.

Enable inference with a merged decoder in ORTModelForCausalLM by @JingyaHuang in https://github.com/huggingface/optimum/pull/647

Single-file ORTModel accept numpy arrays

ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.

Accept numpy.ndarray as input and output to ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/790

ORTOptimizer support for ORTModelForCausalLM

ORTOptimizer support ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/794
Support IO Binding for merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/797

Breaking changes

In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: https://github.com/huggingface/optimum/pull/747. The old behavior is still accessible with --monolith.
In decoders, reusing past key values is now the default in the ONNX export: https://github.com/huggingface/optimum/pull/748. The old behavior is still accessible by explicitly passing, for example, --task causal-lm instead of --task causal-lm-with-past.
BigBird support in the ONNX export is removed, due to the block_sparse attention type being written in pure numpy in Transformers, and hence not exportable to ONNX: https://github.com/huggingface/optimum/pull/778
The parameter from_transformers of ORTModel.from_pretrained will be deprecated in favor of export.

Bugfixes and improvements

Fix disable shape inference for optimization by @regisss in https://github.com/huggingface/optimum/pull/652
Fix uninformative message when passing use_cache=True to ORTModel and no ONNX with cache is available by @fxmarty in https://github.com/huggingface/optimum/pull/650
Fix provider options when several providers are passed by @fxmarty in https://github.com/huggingface/optimum/pull/653
Add TensorRT engine to ONNX Runtime GPU documentation by @fxmarty in https://github.com/huggingface/optimum/pull/657
Improve documentation around ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/666
minor updates on ONNX config guide by @mszsorondo in https://github.com/huggingface/optimum/pull/662
Fix FlaubertOnnxConfig by @michaelbenayoun in https://github.com/huggingface/optimum/pull/669
Use nvcr.io/nvidia/tensorrt image for GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/660
Better Transformer doc fix by @HamidShojanazeri in https://github.com/huggingface/optimum/pull/670
Add support for LongT5 optimization using ORT transformer optimizer script by @kunal-vaishnavi in https://github.com/huggingface/optimum/pull/683
Add test for missing execution providers error messages by @fxmarty in https://github.com/huggingface/optimum/pull/659
ONNX transformation to cast int64 constants to int32 when possible by @fxmarty in https://github.com/huggingface/optimum/pull/655
Add missing normalized configs by @fxmarty in https://github.com/huggingface/optimum/pull/694
Remove code duplication in ORTModel's load_model by @fxmarty in https://github.com/huggingface/optimum/pull/695
Test more architectures in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/675
Avoid initializing unwanted attributes for ORTModel's having several inference sessions by @fxmarty in https://github.com/huggingface/optimum/pull/696
Fix the ORTQuantizer loading from specific file by @echarlaix in https://github.com/huggingface/optimum/pull/701
Add saving of diffusion model additional components for onnx export by @echarlaix in https://github.com/huggingface/optimum/pull/699
Fix whisper export by @mht-sharma in https://github.com/huggingface/optimum/pull/629
Support trust remote code option in ONNX export and ONNX Runtime integration by @fxmarty in https://github.com/huggingface/optimum/pull/702
Add nightly tests on dependencies dev versions by @fxmarty in https://github.com/huggingface/optimum/pull/703
Fix exception condition by @mht-sharma in https://github.com/huggingface/optimum/pull/706
Add ORTModelForMultipleChoice to the documentation by @fxmarty in https://github.com/huggingface/optimum/pull/712
Fix yaml format for dev tests by @fxmarty in https://github.com/huggingface/optimum/pull/710
Add ONNX Runtime training benchmark by @JingyaHuang in https://github.com/huggingface/optimum/pull/592
Allow from optimum.onnxruntime import QuantizationConfig by @fxmarty in https://github.com/huggingface/optimum/pull/715
Fix documentation for doctest tests to pass by @fxmarty in https://github.com/huggingface/optimum/pull/713
Use transformers>=4.26.0 in setup.py by @fxmarty in https://github.com/huggingface/optimum/pull/723
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/724
Fix ONNX Runtime inference in ORTTrainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/709
onnxruntime/modeling_ort.py refactor, part 1 by @michaelbenayoun in https://github.com/huggingface/optimum/pull/698
Update docker and doc of ORT Trainer by @JingyaHuang in https://github.com/huggingface/optimum/pull/725
Add test for code examples in the documentation and docstrings by @fxmarty in https://github.com/huggingface/optimum/pull/704
add image classification example to optimum by @prathikr in https://github.com/huggingface/optimum/pull/711
Add TensorrtExecutionProvider modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/722
Whisper shape inference fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/726
Add some redirections to Optimum Habana's documentation by @regisss in https://github.com/huggingface/optimum/pull/735
Patch ORTTrainer inference with ONNX Runtime backend by @JingyaHuang in https://github.com/huggingface/optimum/pull/737
Remove dead code in whisper ONNX output by @fxmarty in https://github.com/huggingface/optimum/pull/741
Unpin protobuf 3.20.1 by @fxmarty in https://github.com/huggingface/optimum/pull/738
Fix speech2text export by @mht-sharma in https://github.com/huggingface/optimum/pull/746
Raise error on double call to BetterTransformer.transform() by @fxmarty in https://github.com/huggingface/optimum/pull/750
exporters.onnx output names and dynamic axes fix by @michaelbenayoun in https://github.com/huggingface/optimum/pull/731
Fix NNCF supported quantization strategies README table by @echarlaix in https://github.com/huggingface/optimum/pull/752
Add GPU tests for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/751
Fix doctest by @fxmarty in https://github.com/huggingface/optimum/pull/759
Fix ONNX Runtime cache usage for decoders, add relevant tests by @fxmarty in https://github.com/huggingface/optimum/pull/756
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/758
Update quality tooling for formatting by @regisss in https://github.com/huggingface/optimum/pull/760
Fix wrong shapes used at ONNX export and validation by @fxmarty in https://github.com/huggingface/optimum/pull/764
Change type annotation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/768
Fix stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/762
Disable ONNX Runtime provider check on Windows by @fxmarty in https://github.com/huggingface/optimum/pull/771
Fix FusionOptions following ORT 1.14 release by @fxmarty in https://github.com/huggingface/optimum/pull/772
Unpin numpy <1.24.0 by @fxmarty in https://github.com/huggingface/optimum/pull/773
Fix flaky ONNX Runtime generation test with past key value reuse by @fxmarty in https://github.com/huggingface/optimum/pull/765
Fix output shape dimension for OnnxConfigWithPast by @fxmarty in https://github.com/huggingface/optimum/pull/780
Fix used shapes, device at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/777
Pin numpy only for tensorflow export by @fxmarty in https://github.com/huggingface/optimum/pull/781
Fixed broken paper space links by @Muhtasham in https://github.com/huggingface/optimum/pull/766
Temporarily disable python 3.9 + macOS test due to onnxruntime 1.14 regression by @fxmarty in https://github.com/huggingface/optimum/pull/783
Update ORT Training to 1.14.0 by @JingyaHuang in https://github.com/huggingface/optimum/pull/787
Temporarily disable segformer TensorRT test by @fxmarty in https://github.com/huggingface/optimum/pull/799
Use a stateful ordered_input_names in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/796
Test ORTOptimizer with IO Binding by @fxmarty in https://github.com/huggingface/optimum/pull/801
[BT] Add stable layer-norm Wav2vec2 by @younesbelkada in https://github.com/huggingface/optimum/pull/803
Update rules for ruff by @regisss in https://github.com/huggingface/optimum/pull/806
Improve orttrainer test by @JingyaHuang in https://github.com/huggingface/optimum/pull/779
Fix ORT quantization for TensorRT documentation by @fxmarty in https://github.com/huggingface/optimum/pull/812
Fix GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/814
Update ONNX Runtime training doc - use torchrun by @JingyaHuang in https://github.com/huggingface/optimum/pull/820
Fix ONNX export tests by @fxmarty in https://github.com/huggingface/optimum/pull/822
All back workflow dispatch on GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/823
BetterTransformer pipeline padding issue fix by @vrdn-23 in https://github.com/huggingface/optimum/pull/821
Fix optimum pipeline initialization by @fxmarty in https://github.com/huggingface/optimum/pull/824
Fix failing GPU tests by @fxmarty in https://github.com/huggingface/optimum/pull/829
Remove feature dimension as dynamic axes for stable diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/816
Fix pipeline task dropping arguments bug by @fxmarty in https://github.com/huggingface/optimum/pull/828
Fix ORTQuantizer behavior with ORTModelForCausalLM by @fxmarty in https://github.com/huggingface/optimum/pull/831
Update tests by @mht-sharma in https://github.com/huggingface/optimum/pull/826
Fix exporters GPU CI by @fxmarty in https://github.com/huggingface/optimum/pull/835
Keep intermediary models for ONNX causal-lm by @fxmarty in https://github.com/huggingface/optimum/pull/834
Fix duplicate name merged decoder by @fxmarty in https://github.com/huggingface/optimum/pull/837
Apply lazy import for exporters by @JingyaHuang in https://github.com/huggingface/optimum/pull/836

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.7.0

optimum - v1.6.4: Patch release

Published by fxmarty over 1 year ago

Bugfix

Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: https://github.com/huggingface/optimum/commit/b9211d6826b92700e73f48821d6e14bd08226abc
ONNX Runtime 1.14 support: https://github.com/huggingface/optimum/pull/772

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.3...v1.6.4

optimum - v1.6.3: Patch release

Published by JingyaHuang over 1 year ago

Fixes ORTTrainer for the inference with the ONNX Runtime backend.

optimum - v1.6.2: Patch release

Published by fxmarty over 1 year ago

Hotfixes

Support generation config in ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/651

Regressions

The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: https://github.com/huggingface/optimum/issues/721

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.1...v1.6.2

optimum - v1.6.1: Patch release

Published by fxmarty almost 2 years ago

Hotfixes

Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in https://github.com/huggingface/optimum/pull/643
Fix item access of some _TASKS_TO_AUTOMODELS by @fxmarty in https://github.com/huggingface/optimum/pull/642

Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1

optimum - v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Published by fxmarty almost 2 years ago

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

Add Optimum CLI backbone by @fxmarty in https://github.com/huggingface/optimum/pull/593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

Add Stable Diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in the documentation.

[BT] Add Bettertransformer support for FSMT by @Sumanth077 in https://github.com/huggingface/optimum/pull/494
[BT] add BetterTransformer support for ViLT architecture by @ka00ri in https://github.com/huggingface/optimum/pull/508
Add MBart support for BetterTransformer by @ravenouse in https://github.com/huggingface/optimum/pull/516
Add CLIP BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/534
Add BetterTransformer support for RemBERT by @hchings in https://github.com/huggingface/optimum/pull/545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

Add Swin support in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/528
[ONNX] add mobilenet support by @younesbelkada in https://github.com/huggingface/optimum/pull/633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx

yielding:

.
└── t5_small_onnx
    ├── config.json
    ├── decoder_model.onnx
    ├── decoder_with_past_model.onnx
    ├── encoder_model.onnx
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

Add ort export in exporters for encoder-decoder models by @mht-sharma in https://github.com/huggingface/optimum/pull/497
Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in https://github.com/huggingface/optimum/pull/554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

Handling ONNX models with external data by @NouamaneTazi in https://github.com/huggingface/optimum/pull/586
Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in https://github.com/huggingface/optimum/pull/332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.
ONNX Runtime integration API improvement by @michaelbenayoun in https://github.com/huggingface/optimum/pull/515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: optimum-cli export onnx --help

Support custom shapes for dummy inputs by @fxmarty in https://github.com/huggingface/optimum/pull/522
Support for custom input shapes in exporters onnx by @fxmarty in https://github.com/huggingface/optimum/pull/575

Enable `use_cache=True` for ORTModelForCausalLM

Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)

Enable past_key_values for ORTModelForCausalLM by @echarlaix in https://github.com/huggingface/optimum/pull/326

IO binding support for ORTModelForCustomTasks

ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

Add IO binding support for custom ORTModel by @JingyaHuang in https://github.com/huggingface/optimum/pull/447

Experimental support to merge ONNX decoder with/without past key values

Along with --for-ort, when passing --task causal-lm-with-past , --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/

import onnx
from optimum.onnx import merge_decoders

decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")

merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")

Merge ONNX decoder models by @JingyaHuang in https://github.com/huggingface/optimum/pull/587

Major bugs fixed

Fix BetterTransformer with padding="max_length" by @fxmarty in https://github.com/huggingface/optimum/pull/543
Fix non-nesting bug in BetterTransformer integration by @younesbelkada in https://github.com/huggingface/optimum/pull/637

Other changes, bugfixes and improvements

Fix doc-builder premission error by @mishig25 in https://github.com/huggingface/optimum/pull/482
Fix doc build pr premissions by @mishig25 in https://github.com/huggingface/optimum/pull/484
Re-order the task manager doc by @michaelbenayoun in https://github.com/huggingface/optimum/pull/483
Fix whisper device for gpu test by @fxmarty in https://github.com/huggingface/optimum/pull/486
Fix tensorflow CI by @fxmarty in https://github.com/huggingface/optimum/pull/489
Fix PR doc generation by @regisss in https://github.com/huggingface/optimum/pull/495
Fix broken links in the doc by @fxmarty in https://github.com/huggingface/optimum/pull/499
Update iobinding ORT encoder whisper by @mht-sharma in https://github.com/huggingface/optimum/pull/498
fix NormalizedConfig init error message by @PaulQbFeng in https://github.com/huggingface/optimum/pull/500
Change import structure for ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/456
[BT] Fix failing CI tests by @younesbelkada in https://github.com/huggingface/optimum/pull/501
Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in https://github.com/huggingface/optimum/pull/504
[BT] put decorator on the correct place by @younesbelkada in https://github.com/huggingface/optimum/pull/509
[BT] clearer error message for norm_first by @younesbelkada in https://github.com/huggingface/optimum/pull/510
Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/513
Fix ORTModelForSeq2SeqLM test by @fxmarty in https://github.com/huggingface/optimum/pull/455
Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in https://github.com/huggingface/optimum/pull/514
[BT] Fix doc bugs by @younesbelkada in https://github.com/huggingface/optimum/pull/517
Replace sklearn by scikit-learn by @lesteve in https://github.com/huggingface/optimum/pull/502
ORTModel uses optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/490
Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in https://github.com/huggingface/optimum/pull/523
Added support for Tapas Model by @JuheonChu in https://github.com/huggingface/optimum/pull/520
Add benchmark results to gpu doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/525
ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in https://github.com/huggingface/optimum/pull/529
Better error message when wrong task is given to exporters by @fxmarty in https://github.com/huggingface/optimum/pull/531
Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in https://github.com/huggingface/optimum/pull/533
Fold sections by default in the documentation's side-bar by @regisss in https://github.com/huggingface/optimum/pull/535
Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in https://github.com/huggingface/optimum/pull/536
Add check_if_transformers_greater to manage different versions of transformers by @regisss in https://github.com/huggingface/optimum/pull/537
Enable to push some sections to the end of the TOC in the doc by @regisss in https://github.com/huggingface/optimum/pull/532
Fix import in ONNX export CLI by @fxmarty in https://github.com/huggingface/optimum/pull/553
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/550
Refactor of 2 functions used in ORTModel by @michaelbenayoun in https://github.com/huggingface/optimum/pull/551
Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/556
Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in https://github.com/huggingface/optimum/pull/561
Fix flaky BetterTransformer test by @fxmarty in https://github.com/huggingface/optimum/pull/564
enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in https://github.com/huggingface/optimum/pull/547
Update documentation quick tour section by @echarlaix in https://github.com/huggingface/optimum/pull/574
Move custom IOBinding to IOBindingHelper by @JingyaHuang in https://github.com/huggingface/optimum/pull/571
Add test for exporters.onnx CLI by @fxmarty in https://github.com/huggingface/optimum/pull/573
Documentation on quantization by @michaelbenayoun in https://github.com/huggingface/optimum/pull/565
More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in https://github.com/huggingface/optimum/pull/576
Fix errors in onnxruntime modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/585
[BT] fix flaky test by @younesbelkada in https://github.com/huggingface/optimum/pull/591
Fix exporters onnx shapes by @fxmarty in https://github.com/huggingface/optimum/pull/581
Fix exporters.onnx tests by @fxmarty in https://github.com/huggingface/optimum/pull/584
Update on the ONNX Runtime documentation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/567
Add the ORTModelForSemanticSegmentation class by @TheoMrc in https://github.com/huggingface/optimum/pull/539
Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in https://github.com/huggingface/optimum/pull/594
Constraint temprarily NumPy version to save CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/614
Add encoder_last_hidden_state as an output for encoder-decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/601
Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/617
Fix documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/603
Documentation improvements by @fxmarty in https://github.com/huggingface/optimum/pull/598
More informative message at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/609
Use optimum exporter for current weight sharing test by @JingyaHuang in https://github.com/huggingface/optimum/pull/616
OnnxConfig now handle the export to encoder / decoder / decoder_with_past themselves by @michaelbenayoun in https://github.com/huggingface/optimum/pull/590
Set explictly the device index by @JingyaHuang in https://github.com/huggingface/optimum/pull/613
Fix ORT GPU test by @JingyaHuang in https://github.com/huggingface/optimum/pull/624
Add GPT-J normalized config by @fxmarty in https://github.com/huggingface/optimum/pull/623
Remove diffusers dependency in onnxruntime code by @fxmarty in https://github.com/huggingface/optimum/pull/619
Use exporters in ORTTrainer by @mht-sharma in https://github.com/huggingface/optimum/pull/546
Improve use_io_binding default value for different execution providers by @JingyaHuang in https://github.com/huggingface/optimum/pull/604
fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
Fixed GPU documentation for HF pipelines by @smiraldr in https://github.com/huggingface/optimum/pull/602
Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in https://github.com/huggingface/optimum/pull/634
Allow kwargs in all generate_dummy_inputs() methods by @fxmarty in https://github.com/huggingface/optimum/pull/638

Full Changelog: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@TheoMrc
- Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
@ravenouse
- Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
@ka00ri
- Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
@Sumanth077
- Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494

optimum - v1.5.2: Patch release

Published by fxmarty almost 2 years ago

Constraint temporarily numpy<1.24.0 (#614)

optimum - v1.5.1: Patch release

Published by fxmarty almost 2 years ago

Deprecate PyTorch 1.12. for BetterTransformer with better error message (#513)

optimum - v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

Published by michaelbenayoun almost 2 years ago

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

BetterTransformer integration (#423)
ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)

Contributions

Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

optimum.exporters creation (#403)
Automatic task detection (#445)

Whisper

Whisper can be exported to ONNX using optimum.exporters.
Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
Fixes and updates in the documentation (#411, #432, #437, #441)
Fixes IOBinding (#454, #461)

Package Rankings

Top 1.36% on Pypi.org

Top 21.48% on Conda-forge.org

Top 38.25% on Anaconda.org

Badges

Extracted from project README

Related Projects

Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT...

21 Mar 2023 1,143

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

13 Apr 2023 4,292

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

31 Aug 2020 9,151

ao

torchao: PyTorch Architecture Optimization (AO). Performant kernels that work with PyTorch.

03 Nov 2023 193

video-transformers

Easiest way of fine-tuning HuggingFace video classification models

12 Aug 2022 131

Keras-OneClassAnomalyDetection

[5 FPS - 150 FPS] Learning Deep Features for One-Class Classification (AnomalyDetection). Corresp...

06 Jan 2019 127

ort

Accelerate PyTorch models with ONNX Runtime

08 Feb 2021 353

text-generation-inference

Large Language Model Text Generation Inference

08 Oct 2022 7,916

nlpboost

Python library for automatic training, optimization and comparison of Transformer models on most ...

28 Dec 2022 20

optimum

Improved memory management in the ONNX export

Extended ONNX export

Support of custom ONNX configurations for export

Extended BetterTransformer support

ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider

ORTModelForSpeechSeq2Seq supported in ORTOptimizer

Major bugfixes

What's Changed

New Contributors

Extended BetterTransformer support

ONNX merged seq2seq models

New models in the ONNX export

Major bugfix

Potentially breaking changes

What's Changed

New Contributors

Breaking change: constant outputs removed from ONNX encoder-decoder models

torch.nn.functional.scaled_dot_product_attention support for decoders in BetterTransformer

New architectures in the ONNX export

(WIP) TFLite export with quantization support

Bugfixes and improvements

New Contributors

New models supported in the ONNX export

New models supported in BetterTransformer

Additional tasks supported in the ONNX Runtime integration

Support of the ONNX export from PyTorch on float16

TFLite export

ONNX Runtime optimization and quantization directly in the CLI

ORTModelForCausalLM supports decoding with a single ONNX

Single-file ORTModel accept numpy arrays

ORTOptimizer support for ORTModelForCausalLM

Breaking changes

Bugfixes and improvements

Bugfix

Hotfixes

Regressions

Hotfixes

Optimum CLI

Stable Diffusion ONNX export

BetterTransformer support for more architectures

ONNX export for more architectures

Extended ONNX export for encoder-decoder and decoder models

Support for ONNX models with external data at export, optimization, quantization

ONNX Runtime API improvement

Custom shapes support at ONNX export

Enable use_cache=True for ORTModelForCausalLM

IO binding support for ORTModelForCustomTasks

Experimental support to merge ONNX decoder with/without past key values

Major bugs fixed

Other changes, bugfixes and improvements

Significant community contributions

BetterTransformer

Contributions

ONNX Runtime IOBinding support

Contributions

Optimum Exporters

Contributions

Whisper

Contributions

Other contributions

Related Projects

Pointcept

AutoGPTQ

Transformers-Tutorials

ao

video-transformers

Keras-OneClassAnomalyDetection

ort

text-generation-inference

nlpboost

`torch.nn.functional.scaled_dot_product_attention` support for decoders in BetterTransformer

Enable `use_cache=True` for ORTModelForCausalLM