Two-Stream-RNN-Pytorch

Modeling Temporal Dynamics and Spatial Configurations of Actions UsingTwo-Stream Recurrent Neural Networks

Stars
8

Two-Stream-RNN-Pytorch

The Pytorch Implementation of 'Modeling Temporal Dynamics and Spatial Configurations of Actions UsingTwo-Stream Recurrent Neural Networks'


Model

due to the limitation of time, we will only train on NTU RGB+D dataset . 25 joints are labeled for each person, and the original paper only have 20 joints for each skeleton, so the architecture of the two rnn have been modeified to fit the dataset. Check the code for details.

NTU RGB+D dataset dataset
Model architecture model

Basically, there are 2x2 combination ways between the Spatial RNN and Temporal RNN, which are Chain + Stacked, Traversal + Stacked, Chain + Hierarchical, Traversal + Hierarchical. See codes down below in the model.py and you will know how to combine them

  • *Hot Issue 6/17 I tried to train this model but find the loss won't drop. Some bugs exist. Yet still the model code should be correct mostly [Fixed]

Dataset

Downloading the dataset

  • We will train our model only on NTU RGB+D dataset which is already big enough for the model to gain robustness
  • Download it with BaiduNetdisk NTU RGB+D dataset(:vnas)

Dataset decomposition

  • Put the dataset file with the code and run conver_data.py. If any errors occur, check the code from line 111 to line 113 and you may
    change the data path. This Process should take at least 1 hour I guess?. You can go to play games now
  • Then when you successfully decomposed the dataset file. It should generat a file named NTU-RGB-D which consist of two subfiles named
    xsub and xview. There are three .npy and .pkl files in each of them.
  • Finaly, build your dataloader like this
data_path = "NTU-RGB-D/xview/val_data.npy"
    label_path = "NTU-RGB-D/xview/val_label.pkl"
    num_frame_path = "NTU-RGB-D/xview/val_num_frame.npy"
    dataset = Feeder(data_path, label_path, num_frame_path,random_valid_choose=False,
                     random_shift=False,
                     random_move=False,
                     window_size=100,
                     normalization=True,
                     debug=False,
                     origin_transfer=False)
                     
loader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=1,
    shuffle=True,
    num_workers=8, pin_memory=True)

3D Transforms

data_path = "NTU-RGB-D/xview/val_data.npy"
    label_path = "NTU-RGB-D/xview/val_label.pkl"
    num_frame_path = "NTU-RGB-D/xview/val_num_frame.npy"
    dataset = Feeder(data_path, label_path, num_frame_path,random_valid_choose=False,
                     random_shift=True,
                     random_move=True,
                     random_rotate=0.2,
                     window_size=100,
                     normalization=False,
                     debug=False,
                     origin_transfer=False)
                     
loader = torch.utils.data.DataLoader(
    dataset=dataset,
    batch_size=1,
    shuffle=True,
    num_workers=8, pin_memory=True)

Train

Set up training hyperparameters

  • Sorry for my messy training code. In train.py from line 35 to line 37, there are some basic Hyperparameters to modify. You can simply just use mine. And if you want
    to modify the number of epochs, watch out the multistep scheduler at line 116
  • In train.py from line 84 to line 111, you can choose the different conbination of the two stream model. It will be easy to understand if you had checked the paper

Test

Download released checkpoints

  • download the checkpoints released
  • there are two checkpoints, one is for two-stream RNN without 3D transforms on cross view dataset and another is for two-stream RNN with 3D transforms on cross view dataset
  • At line78 in test.py, you can change the path of the two checkpoints
  • run test.py

Result

Email Address

email me if you have any emergency issue