
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

APACHE-2.0 License

FastDeploy - FastDeploy 1.0.0 Latest Release

Published by jiangjiajun almost 2 years ago

1.0.0 Release Note

全场景高性能AI部署工具⚡️FastDeploy 1.0.0正式发布!🎉 支持飞桨及开源社区150+模型的多硬件高性能部署,为开发者提供简单全场景简单易用极致高效的全新部署体验!


FastDeploy支持在多种硬件上以不同后端的方式进行推理部署,各后端模块可根据开发者需求灵活编译集成,自行编译参考 FastDeploy编译文档

后端 平台 支持模型格式 支持硬件
Paddle Inference Linux(x64)/Windows(x64) Paddle x86 CPU/NVIDIA GPU/Jetson/GraphCore IPU
Paddle Lite Linux(aarch64/armhf)/Android Paddle Arm CPU/Kunlun R200/RV1126
Poros Linux(x64) TorchScript x86 CPU/NVIDIA GPU
OpenVINO Linux(x64)/Windows(x64)/OSX(x86) Paddle/ONNX x86 CPU/Intel GPU
TensorRT Linux(x64/aarch64)/Windows(x64) Paddle/ONNX NVIDIA GPU/Jetson
ONNX Runtime Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) Paddle/ONNX x86 CPU/Arm CPU/NVIDIA GPU

除此之外,FastDeploy也基于Paddle.js 支持模型在网页前端及智能小程序部署工具,参阅 Web部署 了解更多细节。



除飞桨开发套件外,FastDeploy同时支持了开源社区内热门深度学习模型的部署,目前v1.0共完成150+模型的支持,下表为部分重点模型的支持情况,阅读 部署示例 了解更多详细内容。

场景 支持模型
图像分类 ResNet/MobileNet/PP-LCNet/YOLOv5-Clas等系列模型
语义分割 PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet等系列模型
图像/视频抠图 PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
文字识别 PP-OCRv2/PP-OCRv3
目标跟踪 PP-Tracking
姿态/关键点识别 PP-TinyPose/HeadPose-FSANet
人脸对齐 PFLD/FaceLandmark1000/PIPNet等系列模型
人脸检测 RetinaFace/UltraFace/YOLOv5-Face/SCRFD等系列模型
人脸识别 ArcFace/CosFace/PartialFC/VPL/AdaFace等系列模型
语音合成 PaddleSpeech 流式语音合成模型
语义表示 PaddleNLP ERNIE 3.0 Tiny系列模型
信息抽取 PaddleNLP 通用信息抽取UIE模型
文图生成 Stable Diffusion


FastDeploy基于 Triton Inference Server 提供服务化部署能力。支持Paddle/ONNX模型在不同硬件以及不同后端上的高性能服务化部署体验。



FastDeploy基于 PaddleSlim 提供一键量化工具,通过如下命令快速完成模型的无损压缩加速。

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'  


硬件/推理后端 ONNX Runtime Paddle Inference TensorRT Paddle Inference TensorRT Paddle Lite
CPU 支持 支持 - - 支持
GPU - - 支持 支持 -
RK1126 - - - - 支持




为了便于对多框架模型的部署支持,FastDeploy预置了 X2Paddle 转换能力,在安装FastDeploy后,通过如下命令可快速完成转换,并通过FastDeploy部署。

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model




  • 服务端对预处理过程进行融合,降低内存创建开销和计算量
  • 移动端集成百度视觉技术部自研高性能图像处理库 FlyCV


1.0.0 Release Note

We are excited to announce the release of ⚡️FastDeploy 1.0.0! 🎉 FastDeploy supports high performance end-to-end deployment for over 150 AI models from PaddlePaddle and other open source community on multiple hardware.

Multiple Inference Backend and Hardware Support

FastDeploy supports inference deployment on multiple hardware with different backends, each backend module can be flexibly compiled and integrated according to the developer's needs, please refer to FastDeploy compilation documentation

Backend Platform Model Format Supported Hardware in FastDeploy
Paddle Inference Linux(x64)/Windows(x64) Paddle x86 CPU/NVIDIA GPU/GraphCore IPU
Paddle Lite Linux(aarch64/armhf)/Android Paddle Arm CPU/Kunlun R200/RV1126
Poros Linux(x64)/Windows(x64) TorchScript x86 CPU/NVIDIA GPU
OpenVINO Linux(x64)/Windows(x64)/OSX(x86) Paddle/ONNX x86 CPU/Intel GPU
TensorRT Linux(x64/aarch64)/Windows(x64) Paddle/ONNX NVIDIA GPU/Jetson
ONNX Runtime Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) Paddle/ONNX x86 CPU/Arm CPU/NVIDIA GPU

In addition, FastDeploy also supports the deployment of models on the web and mini application based on Paddle.js, see Web Deployment for more details.

AI Model End-to-end Inference Support

FastDeploy supports end-to-end deployment of the following PaddlePaddle models are as follows:

In addition, FastDeploy also supports the deployment of popular deep learning models in the open source community. over 150 models are currently supported in release 1.0, the table below shows some of the key models supported, refer to deployment examples for more details.

Task Supported Models
Classification ResNet/MobileNet/PP-LCNet/YOLOv5-Clas and other series models
Object Detection PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet and other series models
Segmentation PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet and other series models
Image/Video Matting PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
Video Super-Resolution PP-MSVSR/BasicVSR/EDVR
Object Tracking PP-Tracking
Posture/Key-point Recognition PP-TinyPose/HeadPose-FSANet
Face Align PFLD/FaceLandmark1000/PIPNet and other series models
Face Detection RetinaFace/UltraFace/YOLOv5-Face/SCRFD and other series models
Face Recognition ArcFace/CosFace/PartialFC/VPL/AdaFace and other series models
Text-to-Speech PaddleSpeech Streaming Speech Synthesis Model
Semantic Representation PaddleNLP ERNIE 3.0 series models
Information Extraction PaddleNLP Universal Information Extraction UIE model
Content Generation Stable Diffusion

High Performance Serving Deployment

⚡️FastDeploy provides high performance serving system for AI model based on Triton Inference Server . Supports the Paddle/ONNX model for a fast service-base deployment experience on different hardware and different backends.

Tool Components

PaddleSlim Auto Compression Toolkit

FastDeploy provides a one-click quantization tool based on PaddleSlim to quickly speed up the lossless compression of models with the following commands.

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'  

FastDeploy has now completed testing the adaptation of the quantitative model on the following backend

Hardware/Deployment backend ONNX Runtime Paddle Inference TensorRT Paddle Inference TensorRT Paddle Lite
CPU Supported Supported - - Supported
GPU - - Supported Supported -
RK1126 - - - - Supported

The following table compares the accuracy and performance of auto-compression, with virtually no loss of overall accuracy and improved performance 100%~400%


For more details and usage of the one-click quantization, see FastDeploy one-click quantization.

Model Conversion

To facilitate deployment support for multiple framework models, FastDeploy integrates X2Paddle conversion capabilities, which can be quickly completed and deployed via FastDeploy with the following command after installing FastDeploy.

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model

For more information on how to use it, see FastDeploy Model Conversion

End-to-end Deployment Performance Optimisation

FastDeploy focuses on the end-to-end deployment experience and performance in each model deployment. In version 1.0, FastDeploy has made the following end-to-end optimisations:

  • Server-side fusion of pre-processing processes to reduce memory creation overhead and computation
  • Mobile integration with Baidu Vision's own high-performance image processing library FlyCV

The end-to-end inference performance of all models is significantly improved compared to the original deployment code which has Combined with the advantages of FastDeploy's multi-backend support. and the following table shows the test data of some of the models

Thanks to the following developers for their contributions to FastDeploy! Contributors List
@leiqing1 @jiangjiajun @DefTruth @joey12300 @felixhjh @ziqi-jin @yunyaoXYY @wjj19950828 @heliqi @ZeyuChen @ChaoII @Zheng-Bicheng @wang-xinyu @HexToString @yeliang2258 @WinterGeng @LDOUBLEV @rainyfly @czr-gc @chenqianhe @kiddyjinjin @Zeref996 @TrellixVulnTeam @D-DanielYang @totorolin @hguandl @ChrisKong93 @Xiue233 @jm12138 @triple-Mu @yingshengBD @GodIsBoom @PatchTester @onecatcn

FastDeploy - FastDeploy 0.8.0

Published by jiangjiajun almost 2 years ago

0.8.0 Release Note

图像分类 目标检测 语义分割 文字识别 人脸检测
工程代码 工程代码 工程代码 工程代码 工程代码
扫码或点击链接安装试用 扫码或点击链接安装试用 扫码或点击链接安装试用 扫码或点击链接安装试用 扫码或点击链接安装试用

0.8.0 Release Note

Image Classification Object Detection Semantic Segmentation OCR Face Detection
Project Code Project Code Project Code Project Code Project Code
Scan the code or click on the link to install and try out Scan the code or click on the link to install try out Scan the code or click on the link to install and try out Scan the code or click on the link to install and try out Scan the code or click on the link to install and try out

New Contributors

Full Changelog:

FastDeploy - FastDeploy 0.7.0 Release Note

Published by jiangjiajun almost 2 years ago

0.7.0 Release Note

  • 新增Paddle Lite TIM-VX集成,支持RK1芯片上的部署 详情
  • 人脸检测模型SCRFD模型新增RKNPU2的部署支持 部署示例
  • 新增Stable Diffusion模型部署示例 部署示例
  • PaddleClas/PaddleDetection/YOLOv5部署代码升级,支持predictbatch_predict
  • 支持大于2G以上的Paddle模型转ONNX部署
  • 新增PaddleClas模型服务化部署案例 部署案例
  • 针对FDTensor增加Pad function操作符,支持在batch预测时,对输入进行Padding
  • 针对FDTensor增加Python API to_dlpack接口,支持FDTensor在不同框架间的无拷贝传输

0.7.0 Release Note

  • Integrate Paddle Lite TIM-VX for supporting hardware such as Rockchip RV1126 . Details
  • Support Face detection model SCRFD on Rockchip RK3588, RK3568 and other hardware.
  • Support Stable Diffusion model deployment.
  • Upgrade PaddleClas、PaddleDetection、YOLOv5 deployment code to support predict and batch_predict;
  • Support for Paddle model to ONNX deployments larger than 2G.
  • Support PaddleClas model service-based deployment.
  • Add the Pad function operator for the FDTensor to support Padding of the input during batch prediction.
  • Add Python API to_dlpack interface for FDTensor to support copyless transfer of FDTensor between frameworks.

New Contributors

Full Changelog:

FastDeploy - FastDeploy 0.6.0 Release Note

Published by jiangjiajun almost 2 years ago

0.6.0 Release Note


  • 新增FSANet头部姿态识别模型 详情
  • 新增PFLD人脸对齐模型 详情
  • PP-Tracking模型增加轨迹可视化 详情
  • 新增ERNIE文本分类模型 详情


  • FastDeploy Runtime新增Clone接口支持,降低Paddle Inference/TensorRT/OpenVINO后端在多实例下内存/显存的使用


  • 新增RKNPU2(3588)部署支持 详情


  • 优化YOLO系列、PaddleClas、PaddleDetection前后处理内存创建逻辑
  • 融合视觉预处理操作,优化PaddleClas、PaddleDetection预处理性能
  • 集成TensorRT BatchedNMSDynamic_TRT插件,提升TensorRT端到端部署性能


  • 修复若干文档问题
  • 增加FastDeploy Runtime C++使用示例 详情

0.6.0 Release Note


  • Support FSANet head pose recognition model Details
  • Support PFLD face alignment model Details
  • PP-Tracking model adds track visualisation Details
  • Support ERNIE text classification model Details

Service-based Deployment

  • FastDeploy Runtime Adds Clone interface support for service-based deployment, reducing the memory、GPU memory usage of Paddle Inference、TensorRT、OpenVINO backend in multiple instances.

Edge Deployment

  • Support RKNPU2(3588) Details.

Performance Optimisation

  • Optimize preprocessing and postprocessing memory creation logic on YOLO series, PaddleClas, PaddleDetection.
  • Integrate visual preprocessing operations, optimize the preprocessing performance of PaddleClas and PaddleDetection, and improve end-to-end performance.
  • Integrating the TensorRT BatchedNMSDynamic_TRT plugin to improve the performance of TensorRT end-to-end deployments.


  • Fixing several documentation issues
  • Adding FastDeploy Runtime C++ usage examples Details

New Contributors

Full Changelog:

FastDeploy - FastDeploy 0.5.0

Published by jiangjiajun almost 2 years ago

What's Changed


  • 新增通过Paddle Inference TensorRT推理支持
  • 新增通过Paddle Inference在IPU硬件上的推理支持
  • 解决原生TensorRT无法支持输入输出INT64数据问题
  • ONNX Runtime、Paddle Inference、TensorRT后端添加多流支持


  • 新增跟踪模型PP-Tracking 示例
  • 新增RobustVideoMatting视频模型 示例
  • 新增FastDeploy模型集成开发流程文档 文档


  • 修复非固定Shape情况下PP-Matting的预测问题
  • 修复语义分割模型Python可视化函数问题
  • 修复部分模型使用文档

New Contributors

Full Changelog:

FastDeploy - FastDeploy 0.4.0

Published by jiangjiajun about 2 years ago


What's Changed


  • 增加FastDeploy Android C++预测库,支持arm64-v8a和armeabi-v7a架构,详见 预编译库下载
  • 增加目标检测模型PicoDet的Android部署,详见示例
  • 增加图像分类PaddleClas系列模型的Android部署,详见示例


  • 优化YOLOv5/6/7 GPU部署端到端性能,通过YOLOv5::UseCudaPreprocessing()启用GPU前处理后,T4 GPU(TensorRT)上性能提升30%~50%,详见PR说明
  • 增加7个Web端js部署案例,详见js部署示例
  • 增加TinyPose以及PicoDet+TinyPose串联Pipeline部署支持,详见示例
  • 增加Torch Vision ResNet系列模型的部署支持,详见示例
  • PPOCRSystemv2 & PPOCRSystemv3重命名为PPOCRv2 & PPOCRv3
  • 优化PaddleSeg & PaddleOCR中部分模型警告信息


  • 增加语义模型TTS服务化部署,详见示例
  • 增加ERNIE 3.0服务化部署,详见示例
  • 修复服务化CPU部署镜像中的core问题



New Contributors

Full Changelog:

FastDeploy - FastDeploy v0.3.0

Published by jiangjiajun about 2 years ago

What's Changed


  • 新增PaddleSeg的PP-ModNet和PP-HumanMatting部署支持 部署示例
  • 新增YOLOv5-Classification模型部署支持 部署示例


  • 基于PaddleSlim提供一键量化工具,支持CPU/GPU上部署性能的倍速提升 详细内容
  • 支持YOLO系列和PaddleClas图像分类系列模型一键量化加速 详细内容


  • 支持用户环境指定自定义路径下的OpenCV、OpenVINO、ONNX Runtime编译依赖
  • Mac x86上增加OpenVINO后端的编译支持
  • 增加arm上Paddle-Lite的后端支持
  • 支持Jetson上编译安装 参考文档


  • 发布FastDeploy-Triton服务化CPU/GPU部署镜像,支持Paddle/ONNX模型的多后端的高性能服务化部署 详细内容
  • 新增YOLOv5服务化部署示例 详细内容


  • 解决模型Predict时修改传入图像的问题
  • 增加TensorRT后端max_workspace_size设置接口
  • 优化PaddleSeg部署模型在动态Shape下的提示信息
  • 修复Windows上加载TensorRT序列化文件失败的问题
  • 增加fastdeploy_init.shfastdeploy_init.bat帮助开发者快速导入FastDeploy依赖库

New Contributors

Full Changelog:

FastDeploy - FastDeploy v0.2.1

Published by jiangjiajun about 2 years ago

What's Changed



  • 新增OpenVINO推理后端,得益于OpenVINO团队的支持,大部分Paddle模型均已支持使用OpenVINO在CPU上加速推理
  • TensorRT优化使用体验,无需再手动调用SetTrtInputShape设置输入范围,改为默认在推理过程中动态设置


  • 新增部分使用文档,包含编译、SDK使用等
  • 优化Windows上编译,使用中的部分易用性问题

New Contributors

Full Changelog:

FastDeploy - FastDeploy v0.2.0

Published by jiangjiajun about 2 years ago


  • 集成Paddle Inference、ONNX Runtime、TensorRT后端,并支持根据模型自动选择最佳推理后端。
  • 支持源码编译,更灵活地选择后端,可参考 FastDeploy编译文档



FastDeploy - FastDeploy v0.1.0

Published by jiangjiajun over 2 years ago

⚡️FastDeploy v0.1.0测试版发布!🎉
💎 发布40个重点模型在8种重点软硬件环境的支持的SDK
😊 支持网页端、pip包两种下载使用方式

Package Rankings
Top 8.66% on