A selection of 3D control scenarios created in a highly efficient simulator, benchmarked with the A2C algorithm
MIT License
A collection of scenarios and efficient benchmarks for the ViZDoom RL environment.
This repository includes:
python train_a2c.py --num_frames 100000
Note if you want to train this agent to convergence it takes between 5-10M frames.
As detailed in the paper, there are a number of scenarios. We include a script generate_scenarios.sh in the repo that will generate the following scenarios:
This takes around 10 minutes so grab a coffee. If you wish to only generate for one scenario, take a look at the script it should be clear what you need to change.
We include pretrained models in the repo that you can test out, or you can train your own agents from scratch. The evaluation code will output example rollouts for all 64 test scenarios.
Evaluation:
SIZE=9
python create_rollout_videos.py --recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/labyrinth/$SIZE/test/ \
--scenario custom_scenario{:003}.cfg --model_checkpoint \
saved_models/labyrinth_$SIZE\_checkpoint_0198658048.pth.tar \
--multimaze --num_mazes_test 64
Training:
SIZE=9
python train_a2c.py --scenario custom_scenario{:003}.cfg \
--recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/labyrinth/$SIZE/train/ \
--test_scenario_dir scenarios/custom_scenarios/labyrinth/$SIZE/test/ \
--multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario
Evaluation:
SIZE=9
python create_rollout_videos.py --recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/find_return/$SIZE/test/ \
--scenario custom_scenario{:003}.cfg --model_checkpoint \
saved_models/find_return_$SIZE\_checkpoint_0198658048.pth.tar \
--multimaze --num_mazes_test 64
Training:
SIZE=9
python train_a2c.py --scenario custom_scenario{:003}.cfg \
--recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/find_return/$SIZE/train/ \
--test_scenario_dir scenarios/custom_scenarios/find_return/$SIZE/test/ \
--multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario
Evaluation:
NUM_ITEMS=4
python create_rollout_videos.py --recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/kitem/$NUM_ITEM/test/ \
--scenario custom_scenario{:003}.cfg --model_checkpoint \
saved_models/$NUM_ITEMS\item_checkpoint_0198658048.pth.tar \
--multimaze --num_mazes_test 64
Training:
NUM_ITEMS=4
python train_a2c.py --scenario custom_scenario{:003}.cfg \
--recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/kitem/$NUM_ITEMS/train/ \
--test_scenario_dir scenarios/custom_scenarios/kitem/$NUM_ITEMS/test/ \
--multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario
Evaluation:
DIFFICULTY=3
python create_rollout_videos.py --recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/two_color/$DIFFICULTY/$DIFFICULTY/test/ \
--scenario custom_scenario{:003}.cfg --model_checkpoint \
saved_models/two_col_p$DIFFICULTY\_checkpoint_0198658048.pth.tar \
--multimaze --num_mazes_test 64
Training:
DIFFICULTY=3
python train_a2c.py --scenario custom_scenario{:003}.cfg \
--recurrent_policy --num_stack 1 --limit_actions \
--scenario_dir scenarios/custom_scenarios/two_color/$DIFFICULTY/train/ \
--test_scenario_dir scenarios/custom_scenarios/two_color/$DIFFICULTY/test/ \
--multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario
In the paper we report a frames per second in terms of envionrment interactions, the agents are trained with a frame skip of 4, which means for each observation the same action is repeated 4 times.
Yes, we have made a tradeoff between increased memory usage in order to increase performance, you can reduce the memory footprint by excluding --fixed scenario from the command line arguments. You will see a 10% drop in efficiency.
Please cite the following: TODO upon acceptance
TODO upon acceptance