Open source reproduction in PyTorch of "Neural Predictor for Neural Architecture Search".
MIT License
Wei Wen, Hanxiao Liu, Hai Li, Yiran Chen, Gabriel Bender, Pieter-Jan Kindermans. "Neural Predictor for Neural Architecture Search". arXiv:1912.00848. Paper link.
This is a open source reproduction in PyTorch.
All the results are run with the hyper-parameters provided in paper (default value in train.py
), unless otherwise specified.
The following results are MSE. The lower, the better.
Train Split | Eval Split | Paper | Reproduction | Comments |
---|---|---|---|---|
172 | all | 1.95 | 3.62 | |
860 | all | NA | 2.94 | |
172 | denoise-80 | NA | 1.90 | |
91-172 | denoise-91 | 0.66 | 0.74 | Paper used classifier to denoise |
91-172 | denoise-91 | NA | 0.56 | epochs = 600, lr = 2e-4 |
NOTE: As the classifier is not ready, we cheated a little by directly filtering out all the architectures below 91%. The splits are called 91-
.
Download HDF5 version of NasBench from here and put it under data
.
Then generate train/eval split:
python tools/split_train_val.py
Skip this step if you have downloaded the data from last step.
This step is to convert the tfrecord into a hdf5 file, as the official asset Google has provided is too slow to read (and very large in volume).
Download nasbench_full.tfrecord
from NasBench, and put it under data
. Then run
python tools/nasbench_tfrecord_converter.py
The following splits are provided for now:
172
, 334
, 860
: Randomly sampled architectures from NasBench.91-172
, 91-334
, 91-860
: The splits above filtered with a threshold (validation accuracy 91% on seed 0).denoise-91
, denoise-80
: All architectures filtered with threshold 91% and 80%.all
.Refer to python train.py -h
for options. Training and evaluation are very fast (about 90 seconds on P100).
The HDF5 is quite self-explanatory. You can refer to dataset.py
for how to read it. The only thing I believe should be highlighted is that metrics
is a 423624 x 4 (epochs: 4, 12, 36, 108) x 3 (seed: 0, 1, 2) x 2 (halfway, total) x 4 (training_time
, train_accuracy
, validation_accuracy
, test_accuracy
) matrix.
A brief case study reveals that the bad results are mainly due to the "noise" in NasBench. In NasBench, there are two types of noises: