Visual reinforcement learning benchmark for controllability
MIT License
BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity. The player needs to walk along a bridge to reach a goal location. When the player walks off the bridge into the water, the current will move it randomly until it gets washed back on the shore. A good agent in this environment avoids this stochastic trap. The implementation of BridgeWalk is based on the Crafter environment.
You can play the game yourself with an interactive window and keyboard input. The mapping from keys to actions, health level, and inventory state are printed to the terminal.
# Install with GUI
pip3 install 'bridgewalk[gui]'
# Start the game
bridgewalk
# Alternative way to start the game
python3 -m bridgewalk.run_gui
The following optional command line flags are available:
Flag | Default | Description |
---|---|---|
--window <width> <height> |
800 800 | Window size in pixels, used as width and height. |
--fps <integer> |
5 | How many times to update the environment per second. |
--record <filename>.mp4 |
None | Record a video of the trajectory. |
--view <width> <height> |
15 15 | The layout size in cells; determines view distance. |
--length <integer> |
None | Time limit for the episode. |
--seed <integer> |
None | Determines world generation and creatures. |
Installation: pip3 install -U bridgewalk
The environment follows the OpenAI Gym interface:
import bridgewalk
env = bridgewalk.Env(seed=0)
obs = env.reset()
assert obs.shape == (64, 64, 3)
done = False
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
A reward of +1 is given the first time in each episode when the agent reaches the island at the end of the bridge.
Episodes terminate after 250 steps.
Each observation is an RGB image that shows a local view of the world around the player.
The action space is categorical. Each action is an integer index representing one of the possible actions:
Integer | Name | Description |
---|---|---|
0 | noop |
Do nothing. |
1 | move_left |
Walk left. |
2 | move_right |
Walk right. |
3 | move_up |
Walk up. |
4 | move_down |
Walk down. |
Please open an issue on Github.