The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
OTHER License
This repo contains the source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
If you like this repo, consider checking out CleanRL (https://github.com/vwxyzjn/cleanrl), the RL library that we used to build this repo.
Prerequisites:
Install dependencies:
poetry install
Train agents:
poetry run python ppo.py
Train agents with experiment tracking:
poetry run python ppo.py --track --capture-video
Install dependencies:
poetry install -E atari
Train agents:
poetry run python ppo_atari.py
Train agents with experiment tracking:
poetry run python ppo_atari.py --track --capture-video
Install dependencies:
poetry install -E pybullet
Train agents:
poetry run python ppo_continuous_action.py
Train agents with experiment tracking:
poetry run python ppo_continuous_action.py --track --capture-video
Install dependencies:
poetry install -E gym-microrts
Train agents:
poetry run python ppo_multidiscrete.py
Train agents with experiment tracking:
poetry run python ppo_multidiscrete.py --track --capture-video
Train agents with invalid action masking:
poetry run python ppo_multidiscrete_mask.py
Train agents with invalid action masking and experiment tracking:
poetry run python ppo_multidiscrete_mask.py --track --capture-video
Install dependencies:
poetry install -E envpool
Train agents:
poetry run python ppo_atari_envpool.py
Train agents with experiment tracking:
poetry run python ppo_atari_envpool.py --track
Solve Pong-v5
in 5 mins:
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3
400 game scores in Breakout-v5
with PPO in ~1 hour (side-effects-free 3-4x speed up compared to ppo_atari.py
with SyncVectorEnv
):
poetry run python ppo_atari_envpool.py --gym-id Breakout-v5
Install dependencies:
poetry install -E procgen
Train agents:
poetry run python ppo_procgen.py
Train agents with experiment tracking:
poetry run python ppo_procgen.py --track
To reproduce the results run with openai/baselines
, install our fork at hhttps://github.com/vwxyzjn/baselines. Then follow the scripts in scripts/baselines
. To reproduce our results, follow the scripts in scripts/ours
.
@inproceedings{shengyi2022the37implementation,
author = {Huang, Shengyi and Dossa, Rousslan Fernand Julien and Raffin, Antonin and Kanervisto, Anssi and Wang, Weixun},
title = {The 37 Implementation Details of Proximal Policy Optimization},
booktitle = {ICLR Blog Track},
year = {2022},
note = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/},
url = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/}
}