⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
APACHE-2.0 License
全场景高性能AI部署工具⚡️FastDeploy 1.0.0正式发布!🎉 支持飞桨及开源社区150+模型的多硬件高性能部署,为开发者提供简单全场景、简单易用、极致高效的全新部署体验!
FastDeploy支持在多种硬件上以不同后端的方式进行推理部署,各后端模块可根据开发者需求灵活编译集成,自行编译参考 FastDeploy编译文档。
后端 | 平台 | 支持模型格式 | 支持硬件 |
---|---|---|---|
Paddle Inference | Linux(x64)/Windows(x64) | Paddle | x86 CPU/NVIDIA GPU/Jetson/GraphCore IPU |
Paddle Lite | Linux(aarch64/armhf)/Android | Paddle | Arm CPU/Kunlun R200/RV1126 |
Poros | Linux(x64) | TorchScript | x86 CPU/NVIDIA GPU |
OpenVINO | Linux(x64)/Windows(x64)/OSX(x86) | Paddle/ONNX | x86 CPU/Intel GPU |
TensorRT | Linux(x64/aarch64)/Windows(x64) | Paddle/ONNX | NVIDIA GPU/Jetson |
ONNX Runtime | Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) | Paddle/ONNX | x86 CPU/Arm CPU/NVIDIA GPU |
除此之外,FastDeploy也基于Paddle.js 支持模型在网页前端及智能小程序部署工具,参阅 Web部署 了解更多细节。
FastDeploy支持如下飞桨模型套件的端到端部署
除飞桨开发套件外,FastDeploy同时支持了开源社区内热门深度学习模型的部署,目前v1.0共完成150+模型的支持,下表为部分重点模型的支持情况,阅读 部署示例 了解更多详细内容。
场景 | 支持模型 |
---|---|
图像分类 | ResNet/MobileNet/PP-LCNet/YOLOv5-Clas等系列模型 |
目标检测 | PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet等系列模型 |
语义分割 | PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet等系列模型 |
图像/视频抠图 | PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting |
文字识别 | PP-OCRv2/PP-OCRv3 |
视频超分 | PP-MSVSR/BasicVSR/EDVR |
目标跟踪 | PP-Tracking |
姿态/关键点识别 | PP-TinyPose/HeadPose-FSANet |
人脸对齐 | PFLD/FaceLandmark1000/PIPNet等系列模型 |
人脸检测 | RetinaFace/UltraFace/YOLOv5-Face/SCRFD等系列模型 |
人脸识别 | ArcFace/CosFace/PartialFC/VPL/AdaFace等系列模型 |
语音合成 | PaddleSpeech 流式语音合成模型 |
语义表示 | PaddleNLP ERNIE 3.0 Tiny系列模型 |
信息抽取 | PaddleNLP 通用信息抽取UIE模型 |
文图生成 | Stable Diffusion |
FastDeploy基于 Triton Inference Server 提供服务化部署能力。支持Paddle/ONNX模型在不同硬件以及不同后端上的高性能服务化部署体验。
FastDeploy基于 PaddleSlim 提供一键量化工具,通过如下命令快速完成模型的无损压缩加速。
fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
--method='PTQ' --save_dir='./yolov5s_ptq_model/'
目前FastDeploy已完成量化模型与如下后端的适配测试
硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT | Paddle Inference TensorRT | Paddle Lite |
---|---|---|---|---|---|
CPU | 支持 | 支持 | - | - | 支持 |
GPU | - | - | 支持 | 支持 | - |
RK1126 | - | - | - | - | 支持 |
自动压缩精度与性能对比如下表所示,精度近乎无损,性能最高提升400%
一键压缩的更多细节与使用方式,参阅FastDeploy一键压缩功能。
为了便于对多框架模型的部署支持,FastDeploy预置了 X2Paddle 转换能力,在安装FastDeploy后,通过如下命令可快速完成转换,并通过FastDeploy部署。
fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model
更多使用方式,参阅FastDeploy模型转换。
FastDeploy在各模型的部署中,重点关注端到端到的部署体验和性能。在1.0版本中,FastDeploy在端到端进行如下优化
结合FastDeploy多后端支持的优势,相较原有部署代码,所有模型端到端性能大幅提升,下表为其中部分模型的测试数据,
We are excited to announce the release of ⚡️FastDeploy 1.0.0! 🎉 FastDeploy supports high performance end-to-end deployment for over 150 AI models from PaddlePaddle and other open source community on multiple hardware.
FastDeploy supports inference deployment on multiple hardware with different backends, each backend module can be flexibly compiled and integrated according to the developer's needs, please refer to FastDeploy compilation documentation。
Backend | Platform | Model Format | Supported Hardware in FastDeploy |
---|---|---|---|
Paddle Inference | Linux(x64)/Windows(x64) | Paddle | x86 CPU/NVIDIA GPU/GraphCore IPU |
Paddle Lite | Linux(aarch64/armhf)/Android | Paddle | Arm CPU/Kunlun R200/RV1126 |
Poros | Linux(x64)/Windows(x64) | TorchScript | x86 CPU/NVIDIA GPU |
OpenVINO | Linux(x64)/Windows(x64)/OSX(x86) | Paddle/ONNX | x86 CPU/Intel GPU |
TensorRT | Linux(x64/aarch64)/Windows(x64) | Paddle/ONNX | NVIDIA GPU/Jetson |
ONNX Runtime | Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) | Paddle/ONNX | x86 CPU/Arm CPU/NVIDIA GPU |
In addition, FastDeploy also supports the deployment of models on the web and mini application based on Paddle.js, see Web Deployment for more details.
FastDeploy supports end-to-end deployment of the following PaddlePaddle models are as follows:
In addition, FastDeploy also supports the deployment of popular deep learning models in the open source community. over 150 models are currently supported in release 1.0, the table below shows some of the key models supported, refer to deployment examples for more details.
Task | Supported Models |
---|---|
Classification | ResNet/MobileNet/PP-LCNet/YOLOv5-Clas and other series models |
Object Detection | PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet and other series models |
Segmentation | PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet and other series models |
Image/Video Matting | PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting |
OCR | PP-OCRv2/PP-OCRv3 |
Video Super-Resolution | PP-MSVSR/BasicVSR/EDVR |
Object Tracking | PP-Tracking |
Posture/Key-point Recognition | PP-TinyPose/HeadPose-FSANet |
Face Align | PFLD/FaceLandmark1000/PIPNet and other series models |
Face Detection | RetinaFace/UltraFace/YOLOv5-Face/SCRFD and other series models |
Face Recognition | ArcFace/CosFace/PartialFC/VPL/AdaFace and other series models |
Text-to-Speech | PaddleSpeech Streaming Speech Synthesis Model |
Semantic Representation | PaddleNLP ERNIE 3.0 series models |
Information Extraction | PaddleNLP Universal Information Extraction UIE model |
Content Generation | Stable Diffusion |
⚡️FastDeploy provides high performance serving system for AI model based on Triton Inference Server . Supports the Paddle/ONNX model for a fast service-base deployment experience on different hardware and different backends.
FastDeploy provides a one-click quantization tool based on PaddleSlim to quickly speed up the lossless compression of models with the following commands.
fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
--method='PTQ' --save_dir='./yolov5s_ptq_model/'
FastDeploy has now completed testing the adaptation of the quantitative model on the following backend
Hardware/Deployment backend | ONNX Runtime | Paddle Inference | TensorRT | Paddle Inference TensorRT | Paddle Lite |
---|---|---|---|---|---|
CPU | Supported | Supported | - | - | Supported |
GPU | - | - | Supported | Supported | - |
RK1126 | - | - | - | - | Supported |
The following table compares the accuracy and performance of auto-compression, with virtually no loss of overall accuracy and improved performance 100%~400%
For more details and usage of the one-click quantization, see FastDeploy one-click quantization.
To facilitate deployment support for multiple framework models, FastDeploy integrates X2Paddle conversion capabilities, which can be quickly completed and deployed via FastDeploy with the following command after installing FastDeploy.
fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model
For more information on how to use it, see FastDeploy Model Conversion。
FastDeploy focuses on the end-to-end deployment experience and performance in each model deployment. In version 1.0, FastDeploy has made the following end-to-end optimisations:
The end-to-end inference performance of all models is significantly improved compared to the original deployment code which has Combined with the advantages of FastDeploy's multi-backend support. and the following table shows the test data of some of the models
Thanks to the following developers for their contributions to FastDeploy! Contributors List
@leiqing1 @jiangjiajun @DefTruth @joey12300 @felixhjh @ziqi-jin @yunyaoXYY @wjj19950828 @heliqi @ZeyuChen @ChaoII @Zheng-Bicheng @wang-xinyu @HexToString @yeliang2258 @WinterGeng @LDOUBLEV @rainyfly @czr-gc @chenqianhe @kiddyjinjin @Zeref996 @TrellixVulnTeam @D-DanielYang @totorolin @hguandl @ChrisKong93 @Xiue233 @jm12138 @triple-Mu @yingshengBD @GodIsBoom @PatchTester @onecatcn
Published by jiangjiajun almost 2 years ago
图像分类 | 目标检测 | 语义分割 | 文字识别 | 人脸检测 |
---|---|---|---|---|
工程代码 | 工程代码 | 工程代码 | 工程代码 | 工程代码 |
扫码或点击链接安装试用 | 扫码或点击链接安装试用 | 扫码或点击链接安装试用 | 扫码或点击链接安装试用 | 扫码或点击链接安装试用 |
batch_predict
deployment https://github.com/PaddlePaddle/FastDeploy/pull/611
Image Classification | Object Detection | Semantic Segmentation | OCR | Face Detection |
---|---|---|---|---|
Project Code | Project Code | Project Code | Project Code | Project Code |
Scan the code or click on the link to install and try out | Scan the code or click on the link to install try out | Scan the code or click on the link to install and try out | Scan the code or click on the link to install and try out | Scan the code or click on the link to install and try out |
Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.7.0...release/0.8.0
Published by jiangjiajun almost 2 years ago
SCRFD
模型新增RKNPU2的部署支持 部署示例
Stable Diffusion
模型部署示例 部署示例
PaddleClas
/PaddleDetection
/YOLOv5
部署代码升级,支持predict
及batch_predict
PaddleClas
模型服务化部署案例 部署案例
FDTensor
增加Pad function
操作符,支持在batch预测时,对输入进行PaddingFDTensor
增加Python API to_dlpack
接口,支持FDTensor
在不同框架间的无拷贝传输predict
and batch_predict
;Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.6.0...release/0.7.0
Published by jiangjiajun almost 2 years ago
Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.4.0...release/0.6.0
Published by jiangjiajun almost 2 years ago
Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.4.0...release/0.5.0
Published by jiangjiajun about 2 years ago
0.4.0版本新增Android移动端部署支持!
EnablePinedMemory
接口,支持Paddle Inference和TensorRT推理时,使用Pinned Memory,提升数据从GPU拷贝至CPU的传输生能,详见PR https://github.com/PaddlePaddle/FastDeploy/pull/403
Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.3.0...release/0.4.0
Published by jiangjiajun about 2 years ago
max_workspace_size
设置接口fastdeploy_init.sh
和fastdeploy_init.bat
帮助开发者快速导入FastDeploy依赖库Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.2.1...release/0.3.1
Published by jiangjiajun about 2 years ago
SetTrtInputShape
设置输入范围,改为默认在推理过程中动态设置Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/release/0.2.0...release/0.2.1
Published by jiangjiajun about 2 years ago
Published by jiangjiajun over 2 years ago
⚡️FastDeploy v0.1.0测试版发布!🎉
💎 发布40个重点模型在8种重点软硬件环境的支持的SDK
😊 支持网页端、pip包两种下载使用方式