a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
OTHER License
Bot releases are visible (Hide)
Published by fangjiarui almost 4 years ago
Albert Model uses the model-aware-allocator.
Published by fangjiarui almost 4 years ago
Add Model Aware Allocator for Bert Model.
Published by feifeibear about 4 years ago
Add Quantized Bert using onnxruntime.
Published by feifeibear about 4 years ago
Using onnxruntime-cpu as CPU backend, parallel to our own home-grown implementation.
Published by feifeibear over 4 years ago
Support Transformer decoder used in OpenNMT-py.
New GPU memory allocator.
Be Compatible with Pytorch v1.5.0.
Published by feifeibear over 4 years ago
Add blis to BLAS options.
Published by feifeibear over 4 years ago
Bert Acceleration on CPU and GPU.