A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
MIT License
Bot releases are hidden (Show)
setup.py
(@power-edge)requirements.txt
(remove duplicates from setup.py
)Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.2.1...v2.3.0
Published by araffin 11 months ago
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
gym
dependency, the package is still required for some pretrained agents.--eval-env-kwargs
to train.py
(@Quentin18)ppo_lstm
to hyperparams_opt.py (@technocrat13)pybullet_envs_gymnasium>=0.4.0
optuna.suggest_uniform(...)
by optuna.suggest_float(..., low=..., high=...)
shlex.split()
rl_zoo3/hyperparams_opt.py
type hintsrl_zoo3/exp_manager.py
type hintsPublished by araffin about 1 year ago
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.0.0...v2.1.0
Published by araffin over 1 year ago
Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
CarRacing-v1
to CarRacing-v2
in hyperparameters--n-timesteps
argument to adjust the length of the videorecord_video
steps (before it was stepping in a closed env)Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v1.8.0...v2.0.0
Published by araffin over 1 year ago
We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark
New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
HerReplayBuffer
implementation that supports multiple envsTimeFeatureWrapper
for Panda and Fetch envs, as the new replay buffer should handle timeout.highway-env
version to 1.5 and setuptools to
v65.5 for the CIuse_auth_token
for push to hub utilgym-minigrid
policy (from MlpPolicy
to MultiInputPolicy
)ruff
(fast alternative to flake8) in the Makefileoptuna.suggest_loguniform(...)
by optuna.suggest_float(..., log=True)
ruff
and pyproject.toml
online_sampling
and max_episode_length
argument when using HerReplayBuffer
Published by araffin almost 2 years ago
SB3 v1.7.0, added support for python config files
We are currently creating an open source benchmark, please read https://github.com/openrlbenchmark/openrlbenchmark/issues/7 if you want to help
--yaml-file
argument was renamed to -conf
(--conf-file
) as now python file are supported toonet_arch=[dict(pi=.., vf=..)]
to net_arch=dict(pi=.., vf=..)
)monitor_kwargs
parameterenv_kwargs
of render:True
under the hood for panda-gym v1 envs in enjoy
replay to match visualzation behavior of other envs-tags/--wandb-tags
argument to train.py
to add tags to the wandb runpython -m rl_zoo3.cli
to be called directly--gym-package
when using subprocessesscripts/plot_train.py
plots models such that newer models appear on top of older ones.from gym import spaces
Published by araffin about 2 years ago
You can now install the RL Zoo via pip: pip install rl-zoo3
and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots
) that has the same interface as the scripts (train.py|enjoy.py|...
).
You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).
File: train.py
(you can use python train.py --algo sbx_tqc --env Pendulum-v1
afterward)
import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train
from sbx import TQC
# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
if __name__ == "__main__":
train()
rl_zoo3 train
and rl_zoo3 enjoy
Published by araffin about 2 years ago
--yaml-file
argument option for train.py
to read hyperparameters from custom yaml files (@JohannesUl)custom_object
parameter on record_video.py (@Affonso-Gui)optimize_memory_usage
to False
for DQN/QR-DQN on record_video.py (@Affonso-Gui)ExperimentManager
_maybe_normalize
set training
to False
for eval envs,-P
argument using tqdm and richPublished by araffin about 2 years ago
HistoryWrapper
--device
flag (@gregwar)--max-total-trials
parameter to help with distributed optimization. (@ernestum)vec_env_wrapper
support in the config (works the same as env_wrapper
)RecurrentPPO
support (aka ppo_lstm
)Reacher-v3
name in PPO hyperparameter fileoptimize_memory_usage
to False
for DQN/QR-DQN on Atari games,handle_timeout_termination
replay_buffer_kwargs
"none"
, use NopPruner
instead of diverted MedianPruner
(@qgallouedec)Published by araffin over 2 years ago
Support for Weight and Biases experiment tracking
--track
flag (@vwxyzjn)RawStatisticsCallback
(@vwxyzjn, see https://github.com/DLR-RM/rl-baselines3-zoo/pull/216)Published by araffin over 2 years ago
Published by araffin almost 3 years ago
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
Published by araffin about 3 years ago
--load-last-checkpoint
(@SammyRamone)TypeError
for gym.Env
class entry points in ExperimentManager
(@schuderer)Published by araffin over 3 years ago
HER
is now a replay buffer class and no more an algorithmPlotNoiseRatioCallback
PlotActionWrapper
'lr'
key in Optuna param dict to 'learning_rate'
so the dict can be directly passed to SB3 methods (@justinkterry)utils.callbacks.ParallelTrainCallback
)--load-last-checkpoint
option for the enjoy scriptplotly
package required)scripts/plot_train.py
get_latest_run_id()
so it works in Windows too (@NicolasHaeffner)HER
replay bufferis_bullet()
to ExperimentManager
close()
for the enjoy scriptrequirements.txt
(@amy12xx)SAC
and TD3
search spacespanda-gym
environments (@qgallouedec)Published by araffin over 3 years ago
Blog post: https://araffin.github.io/post/sb3/
HER
handling action noiseHER
and enjoy scriptHER
hyperparametersPublished by araffin over 3 years ago
LinearNormalActionNoise
sb3_contrib
is now requiredTimeFeatureWrapper
was moved to the contrib repoplot_train.py
script with updated plot_training_success.py
n_episodes_rollout
to train_freq
tuple to match latest version of SB3VecEnv
class to use for multiprocessingTQC
QR-DQN
from SB3 contribExperimentManager
classmake_env
with SB3 built-in make_vec_env
utils/utils.py
done)PPO
atari hyperparameters (removed vf clipping)A2C
atari hyperparameters (eps value of the optimizer)DQN
hyperparameters for CartPole