Cross-language and distributed deep learning inference pipeline for WebRTC video streams over Redis Streams. Currently supports YOLOX model, which can run well on CPU.
Original image from YOLOX model to show what this application does in the end.
Topology diagram of this project.
This project aims to demonstrate an approach to designing cross-language and distributed pipeline in deep learning/machine learning domain. Tons of demos and examples can be found on the internet which are developed end-to-end only (mostly) in Python. But this project is one of cross-language examples.
In this project, a Kubernetes-like orchestrator was not used, in place of this, independent Docker engines on different bare-metal host machines were configured, on purpose. The aim is to show how to configure things on multi-bare-metal host machines or multi-datacenter environments, only using Docker.
This project consists of WebRTC signaling and orchestrator service(Go), WebRTC media server service (Go), YOLOX model deep learning inference service (Python), and Web front-end (TypeScript). Also includes a monitoring stack.
Uses for functionality:
Uses for monitoring:
More details of the project and monitoring configuration can be found in docs folder.
To access the web application UI, you can visit http://localhost:9000 (Tested on Chrome), after configuring the containers.
When you click on "Create PeerConnection" button, if everything is configured correctly:
Media Bridge
service via WebRTC,Media Bridge
service will capture frame images from the video as JPEG images, pushes them to Redis Streams,Inference
services will pop a JPEG image data from Redis Streams stream (STREAM_IMAGES = "images"
), execute YOLOX inference model, push detected objects' name, box coordinates, prediction score and resolution to other Redis Streams stream (STREAM_PREDICTIONS = "predictions"
),Signaling
service listens and consumes the Redis Streams stream (predictions
), sends the results to relevant participant (by participantId
in the JSON) via WebSockets.Client side logs:
To see monitoring metrics via Grafana, you can visit http://localhost:9000/grafana.
Monitoring topology:
For more details, read the Monitoring documentation.
Accessing Grafana:
To see monitoring metrics via Grafana, you can visit http://localhost:9000/grafana after configuring the containers.
Login screen: You should log in to Grafana with user: admin
, password: admin
, if you didn't change GF_SECURITY_ADMIN_PASSWORD
environment variable. After that, it will ask you to change the password, you can click on Skip button.
Grafana Dashboard while the system was processing 1 client which was sending webcam video approximately 30 frames per second (fps). We see, the desktop
host machine with GPUs was processed more frame images than the mbp
host machine.
nvidia-smi
tool output of my desktop machine (with GPUs) was like this while running inference (you can see 5 Python processes have been interacting with GPU, and we have 5 replicas running in on this host machine):This project was designed to run in Docker Container. For some configurations, you can check out docker-compose.yaml and .env files in the root folder.
Docker Compose file creates some containers, with some replica instances:
redis: Runs a Redis instance.
web: Runs an Nginx instance for proxy passing HTTP and WebSocket endpoints.
signaling service: The only orchestrator in the project. Other services will register themselves to this application. Also, it serves a WebSocket for doing WebRTC signaling function. When a WebRTC comes to join, this service selects one of registered media bridge services and brings the WebRTC client and media bridge together. Written in Go language.
mediabridge service: The service will register itself to the orchestrator, and can respond to "sdp-offer-req" and "sdp-accept-offer-answer" procedure calls. Also uses Pion WebRTC library to serve as a WebRTC server on a UDP port. Written in Go language. Designed to run as more than one instance, but currently can run only one instance because it should expose a UDP port from Docker container, and different container replicas on same host should be assigned different and available port numbers and the application must know the exposed host port number, in this stage, it couldn't be achieved to dynamically manage this. Maybe it can be achieved on Kubernetes.
inference: The service will register itself to the orchestrator, and can keep track of "images" Redis Stream which streams JPEG image data of video frames. Written in Python language. It makes inferences on incoming images with YOLOX model for object detection. According to configuration or chosen Docker Compose Profile, it can run on CPU or CUDA modes, and also can run as distributed service in different machines. Can be more than one, by docker-compose.yml file's replica values, default is 5.
ui: The web frontend. It gets a media stream from webcam and forwards it to assigned mediabridge service via WebRTC.
You can run it in production mode or development mode.
Before doing chosen one of options below, this step should be done:
.env
file.HOSTNAME_TAG
different value to differentiate host metrics in Grafana:...
HOSTNAME_TAG=tag_name_for_host_machine
...
DOCKER_SOCKET_PREFIX
and DOCKER_SOCKET_SUFFIX
should be left blank as:...
DOCKER_SOCKET_PREFIX=""
DOCKER_SOCKET_SUFFIX=""
...
DOCKER_SOCKET_PREFIX
should be "/" as (See: https://stackoverflow.com/a/41005007):...
DOCKER_SOCKET_PREFIX="/"
DOCKER_SOCKET_SUFFIX=""
...
DOCKER_SOCKET_SUFFIX
should be ".raw" as (See: https://github.com/docker/for-mac/issues/4755):...
DOCKER_SOCKET_PREFIX=""
DOCKER_SOCKET_SUFFIX=".raw"
...
There are different Docker Compose Profiles for different configurations you can choose:
This profile is to run whole services in same host machine, which has no graphic card supporting CUDA. Redis instance will be staying internal, and won't be exposed to network.
$ docker-compose --profile single_host_cpu up -d
This profile is to run whole services in same host machine, which has at least one graphic card supporting CUDA. Redis instance will be staying internal, and won't be exposed to network.
$ docker-compose --profile single_host_gpu up -d
This profile is to run whole services in same host machine, which has no graphic card supporting CUDA. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.
Similar to single_host_cpu
, it can provide all services individually, but supports extra hosts.
$ docker-compose --profile central_with_inference_cpu up -d
This profile is to run whole services in same host machine, which has at least one graphic card supporting CUDA. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.
Similar to single_host_cpu
, it can provide all services individually, but supports extra hosts.
$ docker-compose --profile central_with_inference_gpu up -d
This profile is to run only central services in host machine, without inference services. It doesn't function without any extra inference services with properly registered into Signaling service. Redis instance will be exposed to network, so other inference services on different hosts can be registered further.
$ docker-compose --profile central up -d
.env
file, specify the central host's Redis server and InfluxDB server connection information with REDIS_HOST
, REDIS_PORT
INFLUXDB_HOST
, INFLUXDB_PORT
environment variables, as your central host's IP:...
REDIS_HOST=ip_of_central_host # should be "redis" if single host configuration, e.g. 192.168.0.15 in distributed configuration
REDIS_PORT=port_of_central_host_redis_port # default is 6379
...
INFLUXDB_HOST=ip_of_central_host # should be "influxdb" if single host configuration, e.g. 192.168.0.15 in distributed configuration
INFLUXDB_PORT=port_of_central_host_influxdb_port # default is 8086
...
For only CPU mode:
$ docker-compose --profile inference_cpu up -d
For with GPU support mode:
$ docker-compose --profile inference_gpu up -d
$ docker-compose logs -f
To continue with VS Code and if this is your first time to work with Remote Containers in VS Code, you can check out this link to learn how Remote Containers work in VS Code and follow the installation steps of Remote Development extension pack.
Then, follow these steps:
service
key at .devcontainer/devcontainer.json
of particular service folder you want to debug.Distributed Deep Learning Inference Pipeline project is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.