Detection and classification of head gestures in videos
MIT License
The Nodding Pigeon library provides a pre-trained model and a simple inference API for detecting head gestures in short videos. Under the hood, it uses Google MediaPipe for collecting the landmark features.
Tested for Python 3.8, 3.9, and 3.10.
The best way to install this library with its dependencies is from PyPI:
python3 -m pip install --upgrade noddingpigeon
Alternatively, to obtain the latest version from this repository:
git clone [email protected]:bhky/nodding-pigeon.git
cd nodding-pigeon
python3 -m pip install .
An easy way to try the API and the pre-trained model is to make a short video with your head gesture.
The code snippet below will perform the following:
$HOME/.noddingpigeon/weights/
,60
) for the model.q
to end earlier).from noddingpigeon.inference import predict_video
result = predict_video()
print(result)
# Example result:
# {'gesture': 'nodding',
# 'probabilities': {'has_motion': 1.0,
# 'gestures': {'nodding': 0.9576354622840881,
# 'turning': 0.042364541441202164}}}
Alternatively, you could provide a pre-recorded video file:
from noddingpigeon.inference import predict_video
from noddingpigeon.video import VideoSegment # Optional.
result = predict_video(
"your_head_gesture_video.mp4",
video_segment=VideoSegment.LAST, # Optionally change these parameters.
motion_threshold=0.5,
gesture_threshold=0.9
)
Note that no matter how long your video is, only the
pre-defined number of frames (60
for the current model) are used for
prediction. The video_segment
enum option controls how the frames
are obtained from the video,
e.g., VideoSegment.LAST
means the last (60
) frames will be used.
Thresholds can be adjusted as needed, see explanation in the head gestures section.
The result is returned as a Python dictionary.
{
'gesture': 'turning',
'probabilities': {
'has_motion': 1.0,
'gestures': {
'nodding': 0.009188028052449226,
'turning': 0.9908120036125183
}
}
}
The following gesture
types are available:
nodding
- Repeatedly tilt your head upward and downward.turning
- Repeatedly turn your head leftward and rightward.stationary
- Not tilting or turning your head; translation motion is still treated as stationary.undefined
- Unrecognised gesture or no landmarks detected (usually means no face is shown).To determine the final gesture
:
has_motion
probability is smaller than motion_threshold
(default 0.5
),gesture
is stationary
. Other probabilities are irrelevant.gestures
is considered:
gesture_threshold
(default 0.9
), gesture
is undefined
,nodding
).gesture
is undefined
.probabilities
dictionary is empty.noddingpigeon.inference
predict_video
Detect head gesture shown in the input video either from webcam or file.
video_path
(Optional[str]
, default None
):None
for starting a webcam.model
(Optional[tf.keras.Model]
, default None
):None
for using the default model.max_num_frames
(int
, default 60
):video_segment
(VideoSegment
enum, default VideoSegment.BEGINNING
):VideoSegment
.end_padding
(bool
, default True
):True
and max_num_frames
is set, when the input video has not enoughdrop_consecutive_duplicates
(bool
, default True
):True
, features from a certain frame will not be used to form thepostprocessing
(bool
, default True
):True
, the final result will be presented as the Python dictionarymotion_threshold
(float
, default 0.5
):gesture_threshold
(float
, default 0.9
):postprocessing
is True
, otherwise List[float]
noddingpigeon.video
VideoSegment
Enum class for video segment options.
VideoSegment.BEGINNING
: Collect the required frames for the model from the beginning of the video.VideoSegment.LAST
: Collect the required frames for the model toward the end of the video.noddingpigeon.model
make_model
Create an instance of the model used in this library, optionally with pre-trained weights loaded.
weights_path
(Optional[str]
, default $HOME/.noddingpigeon/weights/*.h5
):None
, no weights will be downloaded nor loaded to the model.NODDING_PIGEON_HOME
can also be used to indicate.noddingpigeon/
directory should be located.tf.keras.Model
object.Brief procedure:
For details, see the data collection and model training scripts in the training directory.