The idea of this project is to transform any wall or surface into an interactive whiteboard just with an ordinary RGB camera and your hand. I hope you'll find it interesting !
Note: The system works also on Jetson Nano, TX2
To use AI whiteboard correctly you need to find a wall or flat surface and place a camera at a distance of about 1 meter. It can be any wall/surface but the system works more accurately with the dark or light monotonous walls/surfaces. We capture an image from a camera. Then we crop this image into a square. Next, we use a hand detector[1] (YOLO[3] - deep neural network),to find a hand in the image. If there is a hand in the image, we crop that hand out of the image and feed it to a Fingertip detector[1] (modified VGG16 - deep neural network). Next, if we can detect fingertips, we use their coordinates to control the whiteboard (See the control section below).
$ git clone https://github.com/preste-ai/camera_ai_whiteboard.git
You can download needed packages via pip using the requirements.txt
file:
pip3 install -r requirements.txt
weights
or weights/engines
.Note: The current TensorRT engines work correctly only on Jetson Xavier NX devices as TensorRT runs device-specific profiling during the optimization phase.If you want to use this models(engines) on others Jetson devices please convert .h5 model with h5_to_trt.py
script on your platform.
$ sudo /usr/sbin/nvpmodel -m 2
sudo jetson_clocks --fan
Check config.py
file and set up needed parameters.
Run from a project root directory:
Jetson Devices
python3 ai_whiteboard.py --rpc --trt
Laptop
python3 ai_whiteboard.py
To draw | To move | To erase | To clean | To save |
---|---|---|---|---|
A custom dataset was collected and labeled (12,000 images) for training. For labeling I used CVAT.
Run from a project root directory:
python3 yolo_train.py
Run from a project root directory:
python3 yolo_test.py
The transformation takes place in 3 stages:
Run from a project root directory:
python3 h5_to_trt.py --folder weights --weights_file yolo --fp 16
Metrics for Hand detection after model conversion.
In order to determine the correctness of the detection, we use the value of IOU. If the value of IOU is more than 0.5 then the detector predicts a hand correctly otherwise - no. The results are given below.
keras model before training | keras model after training | TensorRT engine (fp32) | TensorRT engine (fp16) | |
---|---|---|---|---|
Accuracy | 72.68 % | 89.14 % | 89.14 % | 89.07 % |
Precision | 84.80 % | 99.45 % | 99.45 % | 99.45 % |
Recall | 50.78 % | 77.24 % | 77.24 % | 77.10 % |
Captured image shape : 320x240 Jetson Xavier NX: power mode ID 2: 15W 6 cores
keras model | TensorRT engine (fp32) | TensorRT engine (fp16) | |
---|---|---|---|
Average FPS | 12 | 33 | 60 |