Status: Archive (code is provided as-is, no updates expected)
Generate interfaces for interpreting vision models trained using RL.
The core utilities used to compute feature visualization, attribution and dimensionality reduction can be found in lucid.scratch.rl_util
, a submodule of Lucid. These are demonstrated in this notebook. The code here leverages these utilities to build HTML interfaces similar to the above demo.
Supported platforms: MacOS and Ubuntu, Python 3.7, TensorFlow <= 1.14
git clone https://github.com/openai/understanding-rl-vision.git
pip install -e understanding-rl-vision
The main script processes checkpoint files saved by RL code:
from understanding_rl_vision import rl_clarity
rl_clarity.run('path/to/checkpoint/file', output_dir='path/to/directory')
An example checkpoint file can be downloaded here, or can be generated using the example script. Checkpoint files for a number of pre-trained models are indexed here.
The precise format required of the checkpoint file, along with a full list of keyword arguments, can be found in the function's docstring.
The script will create an interface.html
file, along with directories containing images (which can take up several GB), at the location specified by output_dir
.
By default, the script will also create some files in the directory of the checkpoint file, in an rl-clarity
subdirectory. These contain all the necessary information extracted from the model and environment for re-creating the same interface. To create these files in a temporary location instead, set load_kwargs={'temp_files': True}
. To re-create an interface using existing files, set load_kwargs={'resample': False}
.
The slowest part of the script is computing the attribution in all the required combinations. If you set trajectories_kwargs={'num_envs': num_envs, 'num_steps': num_steps}
, then num_envs
trajectories will be collected, each of length num_steps
, and the script will distribute the trajectories among the MPI workers for computing the attribution. The memory requirements of each worker scales with num_steps
, which defaults to 512 (about as large as a machine with 34 GB of memory can typically handle). The default num_envs
is 8, so it is best to use 8 MPI workers by default to save time, if you have 8 GPUs available.
The script should take a few hours to run, but if it is taking too long, then you can tell the script to ignore the first couple of non-input layers by setting layer_kwargs={'discard_first_n': 2}
, for example. These layers take the longest to compute attribution for since they have the highest spatial resolution, and are usually not that informative anyway.
By default, attribution is only computed for the value function, since computing attribution for every logit of the policy amounts to a large multiplier on the time taken by the script to run. To compute attribution for the policy, set attr_policy=True
. To offset the increased computational load when doing this, you may wish to choose a single layer to compute attribution for by setting layer_kwargs={'name_contains_one_of': ['2b']}
, for example.
To save disk space, the hover effect for isolating single attribution channels can be disabled by setting attr_single_channels=False
, though this will not have much effect on speed.
As shown in this demo, interfaces are divided into a number of sections:
There is also a script for training a model using PPO2 from Baselines, and saving a checkpoint file in the required format:
from understanding_rl_vision import rl_clarity
rl_clarity.train(env_name='coinrun_old', save_dir='path/to/directory')
This script is intended to explain checkpoint files, and has not been well-tested. The example script demonstrates how to train a model and then generate an interface for it.
To generate interfaces, the Svelte source must be compiled to JavaScript. At installation, the module will automatically attempt to download the pre-compiled JavaScript from a remote copy, though this copy is not guaranteed to be kept up-to-date.
To obtain an up-to-date copy, or for development, you may wish to re-compile the JavaScript locally. To do this, first install Node.js if you have not already. On Mac:
brew install node
You will then be able to re-compile the JavaScript:
python -c 'from understanding_rl_vision import rl_clarity; rl_clarity.recompile_js()'
The svelte3
package provides generic functions for compiling version 3 of Svelte to JavaScript or HTML. These can be used to create an easy-to-use command-line tool:
python -c 'from understanding_rl_vision import svelte3; svelte3.compile_html("path/to/svelte/file", "path/to/html/file")'
Detailed usage instructions can be found in the functions' docstrings.
Please cite using the following BibTeX entry:
@article{hilton2020understanding,
author = {Hilton, Jacob and Cammarata, Nick and Carter, Shan and Goh, Gabriel and Olah, Chris},
title = {Understanding RL Vision},
journal = {Distill},
year = {2020},
note = {https://distill.pub/2020/understanding-rl-vision},
doi = {10.23915/distill.00029}
}