Two-Stream-RNN-Pytorch

The Pytorch Implementation of 'Modeling Temporal Dynamics and Spatial Configurations of Actions UsingTwo-Stream Recurrent Neural Networks'

Model

due to the limitation of time, we will only train on NTU RGB+D dataset . 25 joints are labeled for each person, and the original paper only have 20 joints for each skeleton, so the architecture of the two rnn have been modeified to fit the dataset. Check the code for details.


NTU RGB+D dataset
Model architecture

Basically, there are 2x2 combination ways between the Spatial RNN and Temporal RNN, which are Chain + Stacked, Traversal + Stacked, Chain + Hierarchical, Traversal + Hierarchical. See codes down below in the model.py and you will know how to combine them

*Hot Issue 6/17 I tried to train this model but find the loss won't drop. Some bugs exist. Yet still the model code should be correct mostly [Fixed]

Dataset

Downloading the dataset

We will train our model only on NTU RGB+D dataset which is already big enough for the model to gain robustness
Download it with BaiduNetdisk NTU RGB+D dataset(:vnas)

Dataset decomposition

Put the dataset file with the code and run conver_data.py. If any errors occur, check the code from line 111 to line 113 and you may
change the data path. This Process should take at least 1 hour I guess?. You can go to play games now
Then when you successfully decomposed the dataset file. It should generat a file named NTU-RGB-D which consist of two subfiles named
xsub and xview. There are three .npy and .pkl files in each of them.
Finaly, build your dataloader like this

data_path = "NTU-RGB-D/xview/val_data.npy"
    label_path = "NTU-RGB-D/xview/val_label.pkl"
    num_frame_path = "NTU-RGB-D/xview/val_num_frame.npy"
    dataset = Feeder(data_path, label_path, num_frame_path,random_valid_choose=False,
                     random_shift=False,
                     random_move=False,
                     window_size=100,
                     normalization=True,
                     debug=False,
                     origin_transfer=False)
                     
loader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=1,
    shuffle=True,
    num_workers=8, pin_memory=True)

3D Transforms

data_path = "NTU-RGB-D/xview/val_data.npy"
    label_path = "NTU-RGB-D/xview/val_label.pkl"
    num_frame_path = "NTU-RGB-D/xview/val_num_frame.npy"
    dataset = Feeder(data_path, label_path, num_frame_path,random_valid_choose=False,
                     random_shift=True,
                     random_move=True,
                     random_rotate=0.2,
                     window_size=100,
                     normalization=False,
                     debug=False,
                     origin_transfer=False)
                     
loader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=1,
    shuffle=True,
    num_workers=8, pin_memory=True)

Train

Set up training hyperparameters

Sorry for my messy training code. In train.py from line 35 to line 37, there are some basic Hyperparameters to modify. You can simply just use mine. And if you want
to modify the number of epochs, watch out the multistep scheduler at line 116
In train.py from line 84 to line 111, you can choose the different conbination of the two stream model. It will be easy to understand if you had checked the paper

Test

Download released checkpoints

download the checkpoints released
there are two checkpoints, one is for two-stream RNN without 3D transforms on cross view dataset and another is for two-stream RNN with 3D transforms on cross view dataset
At line78 in test.py, you can change the path of the two checkpoints
run test.py