S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics
MIT License
This repository is no longer maintained. If you are looking for RL implementations, there is Stable-Baselines3, for a training framework, there is the RL Baselines3 Zoo.
This repository was made to evaluate State Representation Learning methods using Reinforcement Learning. It integrates (automatic logging, plotting, saving, loading of trained agent) various RL algorithms (PPO, A2C, ARS, ACKTR, DDPG, DQN, ACER, CMA-ES, SAC, TRPO) along with different SRL methods (see SRL Repo) in an efficient way (1 Million steps in 1 Hour with 8-core cpu and 1 Titan X GPU).
We also release customizable Gym environments for working with simulation (Kuka arm, Mobile Robot in PyBullet, running at 250 FPS on a 8-core machine) and real robots (Baxter Robot, Robobo with ROS).
Related papers:
Documentation is available online: https://s-rl-toolbox.readthedocs.io/
Here is a quick example of how to train a PPO2 agent on MobileRobotGymEnv-v0
environment for 10 000 steps using 4 parallel processes:
python -m rl_baselines.train --algo ppo2 --no-vis --num-cpu 4 --num-timesteps 10000 --env MobileRobotGymEnv-v0
The complete command (logs will be saved in logs/
folder):
python -m rl_baselines.train --algo rl_algo --env env1 --log-dir logs/ --srl-model raw_pixels --num-timesteps 10000 --no-vis
To use the robot's position as input instead of pixels, just pass --srl-model ground_truth
instead of --srl-model raw_pixels
Python 3 is required (python 2 is not supported because of OpenAI baselines)
Note: we are using Stable Baselines, a fork of OpenAI Baselines with unified interface and other improvements (e.g. tensorboard support).
--recursive
argument because we are using git submodules):git clone [email protected]:araffin/robotics-rl-srl.git --recursive
sudo apt-get install swig
environment.yml
file (for anaconda users) in the current environmentconda env create --file environment.yml
source activate py35
Please read the documentation for more details.
Several algorithms from Stable Baselines have been integrated along with some evolution strategies and SAC:
Please read the documentation for more details on how to train/load an agent on discrete/continuous actions, and how to add your own rl algorithm.
This repository also allows hyperparameter search, using hyperband or hyperopt for the implemented RL algorithms
for example, here is the command for a hyperband search on PPO2, ground truth on the mobile robot environment:
python -m rl_baselines.hyperparam_search --optimizer hyperband --algo ppo2 --env MobileRobotGymEnv-v0 --srl-model ground_truth
All the environments we propose follow the OpenAI Gym interface. We also extended this interface (adding extra methods) to work with SRL methods (see State Representation Learning Models).
Kuka environment | Mobile Robot environment | Racing car environment | Omnidirectional robot environment |
---|---|---|---|
Name | Action space (discrete) | Action space (continuous) | Rewards | ground truth |
---|---|---|---|---|
KukaButton | 6 actions (3D cardinal direction) | 3 axis (3D cardinal direction) (1) | 1 when target reached, -1 when too far from target or when table is hit, otherwise 0 (2) (3) | the X,Y,Z position of the effector (4) |
KukaRandButton | 6 actions (3D cardinal direction) | 3 axis (3D cardinal direction) (1) | 1 when target reached, -1 when too far from target or when table is hit, otherwise 0 (2) (3) | the X,Y,Z position of the effector (4) |
Kuka2Button | 6 actions (3D cardinal direction) | 3 axis (3D cardinal direction) (1) | 1 when the first target is reached, 1 when the second target is reached, -1 when too far from target or when table is hit, otherwise 0 (2) | the X,Y,Z position of the effector (4) |
KukaMovingButton | 6 actions (3D cardinal direction) | 3 axis (3D cardinal direction) (1) | 1 when target reached, -1 when too far from target or when table is hit, otherwise 0 (2) (3) | the X,Y,Z position of the effector (4) |
MobileRobot | 4 actions (2D cardinal direction) | 2 axis (2D cardinal direction) | 1 when target reached, -1 for a wall hit, otherwise 0 (2) | the X,Y position of the robot (4) |
MobileRobot2Target | 4 actions (2D cardinal direction) | 2 axis (2D cardinal direction) | 1 when target reached, -1 for a wall hit, otherwise 0 (2) | the X,Y position of the robot (4) |
MobileRobot1D | 2 actions (1D cardinal direction) | 1 axis (1D cardinal direction) | 1 when target reached, -1 for a wall hit, otherwise 0 (2) | the X position of the robot (4) |
MobileRobotLineTarget | 4 actions (2D cardinal direction) | 2 axis (2D cardinal direction) | 1 when target reached, -1 for a wall hit, otherwise 0 (2) | the X,Y position of the robot (4) |
CarRacing | 4 actions (left, right, accelerate, brake) | 3 axis (stearing, accelerate, brake) | -100 when out of bounds, otherwise -0.1 | the X,Y position of the car (4) |
OmniRobot | 4 actions (2D cardinal direction) | 2 axis (2D cardinal direction) | 1 when target reached, -1 for a wall hit, otherwise 0 (2) | the X,Y position of the robot (4) |
1. The action space can use 6 axis arm joints control with the --joints
flag
2. The reward can be the euclidian distance to the target with the --shape-reward
flag
3. When using --shape-reward
and --continuous
, the reward for hitting the button is 50 and for being out of bounds is -250. This is to prevent the agent hitting the table to stop the environment early and obtaining a higher reward
4. The ground truth can be relative position from agent to the target by changing the RELATIVE_POS
constant in the environment file
the available environments are:
Please read the documentation for more details (e.g. adding a custom environment).
Please look the SRL Repo to learn how to train a state representation model.
Then you must edit config/srl_models.yaml
and set the right path to use the learned state representations.
The available state representation models are:
Please read the documentation for more details (e.g. adding a custom SRL model).
If a submodule is not downloaded:
git submodule update --init
If you have troubles installing mpi4py, make sure you the following installed:
sudo apt-get install libopenmpi-dev openmpi-bin openmpi-doc
The inverse kinematics function has trouble finding a solution when the arm is fully straight and the arm must bend to reach the requested point.
This work is supported by the DREAM project through the European Union Horizon 2020 FET research and innovation program under grant agreement No 640891.
If you use this toolbox, please cite:
@article{Raffin18,
title={S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning},
author={Raffin, Antonin and Hill, Ashley and Traor{\'e}, Ren{\'e} and Lesort, Timoth{\'e}e and D{\'\i}az-Rodr{\'\i}guez, Natalia and Filliat, David},
journal={arXiv preprint arXiv:1809.09369},
year={2018}
}