AI_whiteboard

Idea

The idea of this project is to transform any wall or surface into an interactive whiteboard just with an ordinary RGB camera and your hand. I hope you'll find it interesting !

Hardware

Jetson Xavier NX JetPack 4.4
Raspberry Pi Camera + ArduCam (8MP IMX219 Sensor Module)

Note: The system works also on Jetson Nano, TX2

Details

To use AI whiteboard correctly you need to find a wall or flat surface and place a camera at a distance of about 1 meter. It can be any wall/surface but the system works more accurately with the dark or light monotonous walls/surfaces. We capture an image from a camera. Then we crop this image into a square. Next, we use a hand detector[1] (YOLO[3] - deep neural network),to find a hand in the image. If there is a hand in the image, we crop that hand out of the image and feed it to a Fingertip detector[1] (modified VGG16 - deep neural network). Next, if we can detect fingertips, we use their coordinates to control the whiteboard (See the control section below).

Launch AI Whiteboard

1. Set up your NVIDIA Jetson Device or use your PC.

Jetson Xavier NX with JetPack 4.4 (CUDA 10.2, TensorRT 7.1.3, cuDNN 8.0)
Install Tensorflow 1.15.3

2. Download AI Whiteboard project. `$ git clone https://github.com/preste-ai/camera_ai_whiteboard.git`

3. Install packages

You can download needed packages via pip using the requirements.txt file:

  pip3 install -r requirements.txt

4. Download weights or TensorRT engines and put it to `weights` or `weights/engines`.

Note: The current TensorRT engines work correctly only on Jetson Xavier NX devices as TensorRT runs device-specific profiling during the optimization phase.If you want to use this models(engines) on others Jetson devices please convert .h5 model with h5_to_trt.py script on your platform.

5. Set up the power mode (ID=2, 15W 6 cores) `$ sudo /usr/sbin/nvpmodel -m 2`

6. Launch device's fan `sudo jetson_clocks --fan`

7. Run AI whiteboard script.

Check config.py file and set up needed parameters.

whiteboard_w : 200 - whiteboard width (px) (displayed on camera caputed image)
whiteboard_h : 200 - whiteboard height (px) (displayed on camera caputed image)
cam_w : 320 - width (px) of a captured image
cam_h : 240 - height (px) of a captured image
framerate : 60 - camera capture framerate (for Raspberry Pi Camera)
zoom_koef : 2 - zoom coefficient to resize whiteboard_w and whiteboard_h
confidence_ft_threshold : 0.5 - confidence threshold of Fingertips detector
confidence_hd_threshold : 0.8 - confidence threshold of Hand detector

Run from a project root directory:

Jetson Devices

  python3 ai_whiteboard.py --rpc --trt

rpc : If you want to use a Raspberry Pi Camera. Default: False
trt : If you want to use TensorRT engines. Default: False

Laptop

  python3 ai_whiteboard.py

Control gesture (combination)

To draw	To move	To erase	To clean	To save

Training Hand-detector

Custom Dataset

A custom dataset was collected and labeled (12,000 images) for training. For labeling I used CVAT.

Train: 9,500 images
Validation: 1000 images
Test : 1500 images

Run from a project root directory:

 python3 yolo_train.py

Testing Hand-detector

Run from a project root directory:

 python3 yolo_test.py

Convert .h5 model to TensorRT engine [2]

The transformation takes place in 3 stages:

Freeze graph and remove training nodes (.h5 -> .pb)
Convert frozen graph to onnx (.pb -> .onnx)
Convert onnx model to TensorRT engine (.onnx -> .engine)

Run from a project root directory:

  python3 h5_to_trt.py --folder weights --weights_file yolo --fp 16

folder weights : path to the folder with model
weights_file : weights file name (without .h5)
fp : TensorRT engine precision (16 or 32)

Metrics for Hand detection after model conversion.

In order to determine the correctness of the detection, we use the value of IOU. If the value of IOU is more than 0.5 then the detector predicts a hand correctly otherwise - no. The results are given below.