Introduction

The Nodding Pigeon library provides a pre-trained model and a simple inference API for detecting head gestures in short videos. Under the hood, it uses Google MediaPipe for collecting the landmark features.

Installation

Tested for Python 3.8, 3.9, and 3.10.

The best way to install this library with its dependencies is from PyPI:

python3 -m pip install --upgrade noddingpigeon

Alternatively, to obtain the latest version from this repository:

git clone [email protected]:bhky/nodding-pigeon.git
cd nodding-pigeon
python3 -m pip install .

Usage

An easy way to try the API and the pre-trained model is to make a short video with your head gesture.

Webcam

The code snippet below will perform the following:

Search for the pre-trained weights file from $HOME/.noddingpigeon/weights/,
if not exists, the file will be downloaded from this repository.
Start webcam.
Collect the needed number of frames (default 60) for the model.
End webcam automatically (or you can press q to end earlier).
Make prediction of your head gesture and print the result to STDOUT.

from noddingpigeon.inference import predict_video

result = predict_video()
print(result)
# Example result:
# {'gesture': 'nodding',
#  'probabilities': {'has_motion': 1.0,
#   'gestures': {'nodding': 0.9576354622840881,
#    'turning': 0.042364541441202164}}}

Video file

Alternatively, you could provide a pre-recorded video file:

from noddingpigeon.inference import predict_video
from noddingpigeon.video import VideoSegment  # Optional.

result = predict_video(
  "your_head_gesture_video.mp4",
  video_segment=VideoSegment.LAST,  # Optionally change these parameters.
  motion_threshold=0.5,
  gesture_threshold=0.9
)

Note that no matter how long your video is, only the pre-defined number of frames (60 for the current model) are used for prediction. The video_segment enum option controls how the frames are obtained from the video, e.g., VideoSegment.LAST means the last (60) frames will be used.

Thresholds can be adjusted as needed, see explanation in the head gestures section.

Result format

The result is returned as a Python dictionary.

{
  'gesture': 'turning',
  'probabilities': {
    'has_motion': 1.0,
    'gestures': {
      'nodding': 0.009188028052449226,
      'turning': 0.9908120036125183
    }
  }
}

Head gestures

The following gesture types are available:

nodding - Repeatedly tilt your head upward and downward.
turning - Repeatedly turn your head leftward and rightward.
stationary - Not tilting or turning your head; translation motion is still treated as stationary.
undefined - Unrecognised gesture or no landmarks detected (usually means no face is shown).

To determine the final gesture:

If has_motion probability is smaller than motion_threshold (default 0.5),
gesture is stationary. Other probabilities are irrelevant.
Otherwise, the largest probability from gestures is considered:
- If it is smaller than gesture_threshold (default 0.9), gesture is undefined,
- else, the corresponding gesture label is selected (e.g., nodding).
If no landmarks are detected in the video, gesture is undefined.
The probabilities dictionary is empty.

API

`noddingpigeon.inference`

`predict_video`

Detect head gesture shown in the input video either from webcam or file.

Parameters:
- video_path (Optional[str], default None):
  File path to the video file, or None for starting a webcam.
- model (Optional[tf.keras.Model], default None):
  A TensorFlow-Keras model instance, or None for using the default model.
- max_num_frames (int, default 60):
  Maximum number of frames to be processed by the model.
  Do not change when using the default model.
- video_segment (VideoSegment enum, default VideoSegment.BEGINNING):
  See explanation of VideoSegment.
- end_padding (bool, default True):
  If True and max_num_frames is set, when the input video has not enough
  frames to form the feature tensor for the model, padding at the end will be
  done using the features detected on the last frame.
- drop_consecutive_duplicates (bool, default True):
  If True, features from a certain frame will not be used to form the
  feature tensor if they are considered to be the same as the previous frame.
  This is a mechanism to prevent "fake" video created with static images.
- postprocessing (bool, default True):
  If True, the final result will be presented as the Python dictionary
  described in the usage section, otherwise the raw model output
  is returned.
- motion_threshold (float, default 0.5):
  See the head gestures section.
- gesture_threshold (float, default 0.9):
  See the head gestures section.
Return:
- A Python dictionary if postprocessing is True, otherwise List[float]
  from the model output.

`noddingpigeon.video`

`VideoSegment`

Enum class for video segment options.

VideoSegment.BEGINNING: Collect the required frames for the model from the beginning of the video.
VideoSegment.LAST: Collect the required frames for the model toward the end of the video.

`noddingpigeon.model`

`make_model`

Create an instance of the model used in this library, optionally with pre-trained weights loaded.

Parameters:
- weights_path (Optional[str], default $HOME/.noddingpigeon/weights/*.h5):
  Path to the weights in HDF5 format to be loaded by the model.
  The weights file will be downloaded if not exists.
  If None, no weights will be downloaded nor loaded to the model.
  Users can provide path if the default is not preferred.
  The environment variable NODDING_PIGEON_HOME can also be used to indicate
  where the .noddingpigeon/ directory should be located.
Return:
- tf.keras.Model object.

Model training

Brief procedure:

Record a few long-ish videos: one for each head gesture done repeatedly
with as many variations as possible, and one for stationary.
Landmark features in the videos are collected using MediaPipe.
During model training, random sub-sequences from the feature collection,
correspond to different video segments and gestures, are generated as
training samples.
This basically means that all samples generated, in each epoch,
are very likely not the same as each other.
This serves as a good regularization as well.
A very simple 1D-convolutional model architecture is used to minimise
overfitting.

For details, see the data collection and model training scripts in the training directory.

Package Rankings

Top 24.3% on Pypi.org

Badges

Extracted from project README

Related Projects

stable-diffusion-keras-ft

Fine-tuning Stable Diffusion using Keras.

24 Dec 2022 56

segmentation_training_pipeline

Research Pipeline for image masking/segmentation in Keras

23 Oct 2018 53

tf-keras-vis

Neural network visualization toolkit for tf.keras

31 Oct 2019 313

Amazon-Forest-Computer-Vision

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of Py...

08 Sep 2017 366

transparent_latent_gan

Use supervised learning to illuminate the latent space of GAN for controlled generation and edit

13 Sep 2018 1,968

deepreplay

Deep Replay - Generate visualizations as in my "Hyper-parameters in Action!" series!

16 Apr 2018 270

GestureAI

RNN(Recurrent Nerural network) model which recognize hand-gestures drawing 5 figures.

25 Sep 2017 24

five-video-classification-methods

Code that accompanies my blog post outlining five video classification methods in Keras and Tenso...

15 Mar 2017 1,178

dreambooth-keras

Implementation of DreamBooth in KerasCV and TensorFlow.

15 Jan 2023 87

opennsfw2

Keras implementation of the Yahoo Open-NSFW model

01 Nov 2021 354

deep_autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Create...

08 May 2021 120

conv3d-video-action-recognition

My experimentation around action recognition in videos. Contains Keras implementation for C3D net...

20 Aug 2019 51

ssd_keras

A Keras port of Single Shot MultiBox Detector

02 Apr 2017 1,859

Keras-OneClassAnomalyDetection

[5 FPS - 150 FPS] Learning Deep Features for One-Class Classification (AnomalyDetection). Corresp...

06 Jan 2019 127

face_vijnana_yolov3

Face detection keras model based on yolov3.

17 May 2019 5