🤙 Hand and body gesture classifier tool - Creation of datasets, real-time visualization, and processing pipeline deployment!
MIT License
This Python package focus on the deployment of gesture control systems. It ease dataset creation, models evaluation, and processing pipeline deployment. The critical element in the proposed processing architecture is the intermediate representation of human bodies as key points to perform efficient classification. In addition to the main application, the package contains two datasets for body/hands pose classificaiton, several classification models, and data augmentation tools that can be accessed through an API. Feel free to check-out the drone-gesture-control repository for a deployment example on Jetson Nano using this package.
Run the following command to install the whole package in the desired Python environment:
pip install pose-classification-kit[app]
If you don't plan to use the application but just want access to the datasets and pre-trained models:
pip install pose-classification-kit
Ensure that Poetry
is installed for Python 3.7 and above on your system.
Git clone the repository
git clone https://github.com/ArthurFDLR/pose-classification-kit.git
cd pose-classification-kit
Create an adequate venv
virtual environment
python -m poetry install
The dataset creation and real-time model evaluation application heavily rely on the pose estimation system OpenPose. It must be installed on your system to allow real-time gesture classification. This step is not requiered if you don't plan to use the application.
Once the installation is completed, change the variable OPENPOSE_PATH
( .\pose-classification-kit\config.py
) to the location of the OpenPose installation folder on your system.
You should now be able to run the application if you installed all optionnal dependancies. See the usage section about how to use the app.
pose-classification-app
The .\examples
folder contains Jupyter Notebook detailing the use of the API to create new classification models. Note that these Notebooks can be executed on Google Colab.
The video stream of the selected camera is fed to OpenPose at all times. The analysis results are displayed on the left side of the application. You have to choose one of the available models in the drop-down at the bottom of the analysis pannel. Keypoints extracted from the video by OpenPose are automatically normalized and fed to the classifier.
First, you either have to load or create a new set of samples for a specific label and hand side. To do so, respectively choose Open (Ctrl+O) or Create new (Ctrl+N) in Dataset of the menu bar. You have to specify the hand side, the label, and the newly created samples set' accuracy threshold. A configuration window will ask for the label and the newly created samples set's accuracy threshold in case of creating a new class. The accuracy threshold defines the minimum accuracy of hand keypoints detection from OpenPose of any sample in the set. This accuracy is displayed on top of the keypoints graph.
Now that a set is loaded in the application, you can record new samples from your video feed or inspect the set and delete inadequate samples. When your done, save the set through Dataset -> Save (Ctrl+S).
Some functionalities are currently unavailable through the GUI:
You can export all dataset samples from .\pose_classification_kit\datasets\Body
and .\pose_classification_kit\datasets\Hands
in two respective CSV files.
export-datasets
You can generate videos similar to this one (.\pose-classification-kit\scripts\video_creation.py
might need some modification to fit your use case).
Currently not functional
video-overlay
There is a total of 20 body dataset classes which contains between 500 and $600$ samples each for a total of 10680 entries. Even if the number of samples from one class to the other varies in the raw dataset, the API yields a balanced dataset of 503 samples per class. Also, by default, 20% of these are reserved for final testing of the model. Each entry in the dataset is an array of 25 2D coordinates. The mapping of these keypoints follows the BODY25 body model. We created the dataset using the BODY25 representation as it is one of the most comprehensive standard body models. However, some pose estimation models, such as the one used on the Jetson Nano, use an 18 keypoints representation (BODY18). The seven missing keypoints do not strongly influence classification as 6 of them are used for feet representation, and the last one is a central hip keypoint. Still, the dataset must be converted to the BODY18 representation. This is done by reindexing the samples based on the comparison of the mapping of both body models. You can choose which body model to use when importing the dataset with the API.
The data augmentation tool currently support the following operations:
from pose_classification_kit.datasets import BODY18, bodyDataset, dataAugmentation
dataset = bodyDataset(testSplit=.2, shuffle=True, bodyModel=BODY18)
x_train = dataset['x_train']
y_train = dataset['y_train_onehot']
x, y = [x_train], [y_train]
# Scaling augmentation
x[len(x):],y[len(y):] = tuple(zip(dataAugmentation(
x_train, y_train,
augmentation_ratio=.1,
scaling_factor_standard_deviation=.08,
)))
# Rotation augmentation
x[len(x):],y[len(y):] = tuple(zip(dataAugmentation(
x_train, y_train,
augmentation_ratio=.1,
rotation_angle_standard_deviation=10,
)))
# Upper-body augmentation
lowerBody_keypoints = np.where(np.isin(BODY18.mapping,[
"left_knee", "right_knee", "left_ankle", "right_ankle"
]))[0]
x[len(x):],y[len(y):] = tuple(zip(dataAugmentation(
x_train, y_train,
augmentation_ratio=.15,
remove_specific_keypoints=lowerBody_keypoints,
random_noise_standard_deviation=.03
)))
lowerBody_keypoints = np.where(np.isin(BODY18.mapping,[
"left_knee", "right_knee", "left_ankle", "right_ankle", "left_hip", "right_hip",
]))[0]
x[len(x):],y[len(y):] = tuple(zip(dataAugmentation(
x_train, y_train,
augmentation_ratio=.15,
remove_specific_keypoints=lowerBody_keypoints,
random_noise_standard_deviation=.03
)))
# Random partial input augmentation
x[len(x):],y[len(y):] = tuple(zip(dataAugmentation(
x_train, y_train,
augmentation_ratio=.2,
remove_rand_keypoints_nbr=2,
random_noise_standard_deviation=.03
)))
x_train_augmented = np.concatenate(x, axis=0)
y_train_augmented = np.concatenate(y, axis=0)
Distributed under the MIT License. See LICENSE
for more information.