rl-baselines3-zoo | PyTorch Ecosystem Directory

Bot releases are hidden (Show)

rl-baselines3-zoo - RL-Zoo3 v2.3.0 Latest Release

Published by araffin 7 months ago

Breaking Changes

Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Upgraded to SB3 >= 2.3.0

Other

Added test dependencies to setup.py (@power-edge)
Simplify dependencies of requirements.txt (remove duplicates from setup.py)

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.2.1...v2.3.0

rl-baselines3-zoo - RL-Zoo3 v2.2.1

Published by araffin 11 months ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Removed gym dependency, the package is still required for some pretrained agents.
Upgraded to SB3 >= 2.2.1
Upgraded to Huggingface-SB3 >= 3.0
Upgraded to pytablewriter >= 1.0

New Features

Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)

Bug fixes

Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)

Documentation

Other

Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)
Switched to ruff for sorting imports
Updated tests to use shlex.split()
Fixed rl_zoo3/hyperparams_opt.py type hints
Fixed rl_zoo3/exp_manager.py type hints

rl-baselines3-zoo - RL-Zoo3 v2.1.0

Published by araffin about 1 year ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Dropped python 3.7 support
SB3 now requires PyTorch 1.13+
Upgraded to SB3 >= 2.1.0
Upgraded to Huggingface-SB3 >= 2.3
Upgraded to Optuna >= 3.0
Upgraded to cloudpickle >= 2.2.1

New Features

Added python 3.11 support

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.0.0...v2.1.0

rl-baselines3-zoo - RL-Zoo3 v2.0.0: Gymnasium Support

Published by araffin over 1 year ago

Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

Fixed bug in HistoryWrapper, now returns the correct obs space limits
Upgraded to SB3 >= 2.0.0
Upgraded to Huggingface-SB3 >= 2.2.5
Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21

New Features

Added Gymnasium support
Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper

Bug fixes

Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
Fixed record_video steps (before it was stepping in a closed env)

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v1.8.0...v2.0.0

rl-baselines3-zoo - RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer

Published by araffin over 1 year ago

Release 1.8.0 (2023-04-07)

We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark

New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/

Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

Breaking Changes

Upgraded to SB3 >= 1.8.0
Upgraded to new HerReplayBuffer implementation that supports multiple envs
Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.

New Features

Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Open RL Benchmark

Bug fixes

Set highway-env version to 1.5 and setuptools to v65.5 for the CI
Removed use_auth_token for push to hub util
Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)

Documentation

Documentation is now built using Sphinx and hosted on read the doc: https://rl-baselines3-zoo.readthedocs.io/en/master/

Other

Added support for ruff (fast alternative to flake8) in the Makefile
Removed Gitlab CI file
Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
Switched to ruff and pyproject.toml
Removed online_sampling and max_episode_length argument when using HerReplayBuffer

rl-baselines3-zoo - RL-Zoo3 v1.7.0 : Added support for python config files

Published by araffin almost 2 years ago

Release 1.7.0 (2023-01-10)

SB3 v1.7.0, added support for python config files

We are currently creating an open source benchmark, please read https://github.com/openrlbenchmark/openrlbenchmark/issues/7 if you want to help

Breaking Changes

--yaml-file argument was renamed to -conf (--conf-file) as now python file are supported too
Upgraded to SB3 >= 1.7.0 (changed net_arch=[dict(pi=.., vf=..)] to net_arch=dict(pi=.., vf=..))

New Features

Specifying custom policies in yaml file is now supported (@Rick-v-E)
Added monitor_kwargs parameter
Handle the env_kwargs of render:True under the hood for panda-gym v1 envs in enjoy replay to match visualzation behavior of other envs
Added support for python config file
Tuned hyperparameters for PPO on Swimmer
Added -tags/--wandb-tags argument to train.py to add tags to the wandb run
Added a sb3 version tag to the wandb run

Bug fixes

Allow python -m rl_zoo3.cli to be called directly
Fixed a bug where custom environments were not found despite passing --gym-package when using subprocesses
Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv

Documentation

Other

scripts/plot_train.py plots models such that newer models appear on top of older ones.
Added additional type checking using mypy
Standardized the use of from gym import spaces

rl-baselines3-zoo - RL-Zoo3 v1.6.2: The RL Zoo is now a package!

Published by araffin about 2 years ago

Highlights

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

RL Zoo is now a python package
low pass filter was removed

New Features

RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

rl-baselines3-zoo - SB3 v1.6.1: Progress bar and custom yaml file

Published by araffin about 2 years ago

Breaking Changes

Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1

New Features

Added --yaml-file argument option for train.pyto read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

Added custom_object parameter on record_video.py (@Affonso-Gui)
Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
In ExperimentManager _maybe_normalize set training to False for eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the -P argument using tqdm and rich

rl-baselines3-zoo - SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Published by araffin about 2 years ago

Release 1.6.0 (2022-08-05)

Breaking Changes

Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default --eval-freq from 10k to 25k steps
Update default horizon to 2 for the HistoryWrapper
Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0

New Features

Support setting PyTorch's device with thye --device flag (@gregwar)
Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
Added vec_env_wrapper support in the config (works the same as env_wrapper)
Added Huggingface hub integration
Added RecurrentPPO support (aka ppo_lstm)
Added autodownload for "official" sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models

Bug fixes

Fix Reacher-v3 name in PPO hyperparameter file
Pinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivate handle_timeout_termination
in the replay_buffer_kwargs

Documentation

Other

When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

rl-baselines3-zoo - SB3 v1.5.0: Support for Weight and Biases experiment tracking

Published by araffin over 2 years ago

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21

New Features

Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see https://github.com/DLR-RM/rl-baselines3-zoo/pull/216)

Bug fixes

Policies saved during during optimization with distributed Optuna load on new systems (@jkterry)
Fixed script for recording video that was not up to date with the enjoy script

rl-baselines3-zoo - SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

Published by araffin over 2 years ago

Breaking Changes

Dropped python 3.6 support
Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
Upgrade to sb3-contrib >= 1.4.0

New Features

Added mujoco hyperparameters
Added MuJoCo pre-trained agents
Added script to parse best hyperparameters of an optuna study
Added TRPO support
Added ARS support and pre-trained agents

Documentation

Replace front image

rl-baselines3-zoo - SB3 v1.3.0: rliable plots and bug fixes

Published by araffin almost 3 years ago

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes

Upgrade to panda-gym 1.1.1
Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
Upgrade to sb3-contrib >= 1.3.0

New Features

Added support for using rliable for performance comparison

Bug fixes

Fix training with Dict obs and channel last images

Other

Updated docker image
constrained gym version: gym>=0.17,<0.20
Better hyperparameters for A2C/PPO on Pendulum

rl-baselines3-zoo - SB3 v1.2.0

Published by araffin about 3 years ago

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
Upgrade to sb3-contrib >= 1.2.0

Bug fixes

Fix --load-last-checkpoint (@SammyRamone)
Fix TypeError for gym.Env class entry points in ExperimentManager (@schuderer)
Fix usage of callbacks during hyperparameter optimization (@SammyRamone)

Other

Added python 3.9 to Github CI
Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)

rl-baselines3-zoo - SB3 v1.1.0

Published by araffin over 3 years ago

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
Upgrade to sb3-contrib >= 1.1.0
Add timeout handling (cf SB3 doc)
HER is now a replay buffer class and no more an algorithm
Removed PlotNoiseRatioCallback
Removed PlotActionWrapper
Changed 'lr' key in Optuna param dict to 'learning_rate' so the dict can be directly passed to SB3 methods (@justinkterry)

New Features

Add support for recording videos of best models and checkpoints (@mcres)
Add support for recording videos of training experiments (@mcres)
Add support for dictionary observations
Added experimental parallel training (with utils.callbacks.ParallelTrainCallback)
Added support for using multiple envs for evaluation
Added --load-last-checkpoint option for the enjoy script
Save Optuna study object at the end of hyperparameter optimization and plot the results (plotly package required)
Allow to pass multiple folders to scripts/plot_train.py
Flag to save logs and optimal policies from each training run (@justinkterry)

Bug fixes

Fixed video rendering for PyBullet envs on Linux
Fixed get_latest_run_id() so it works in Windows too (@NicolasHaeffner)
Fixed video record when using HER replay buffer

Documentation

Updated README (dict obs are now supported)

Other

Added is_bullet() to ExperimentManager
Simplify close() for the enjoy script
Updated docker image to include latest black version
Updated TD3 Walker2D model (thanks @modanesh)
Fixed typo in plot title (@scottemmons)
Minimum cloudpickle version added to requirements.txt (@amy12xx)
Fixed atari-py version (ROM missing in newest release)
Updated SAC and TD3 search spaces
Cleanup eval_freq documentation and variable name changes (@justinkterry)
Add clarifying print statement when printing saved hyperparameters during optimization (@justinkterry)
Clarify n_evaluations help text (@justinkterry)
Simplified hyperparameters files making use of defaults
Added new TQC+HER agents
Add panda-gymenvironments (@qgallouedec)

rl-baselines3-zoo - Stable-Baselines3 v1.0 - 100+ pre-trained models

Published by araffin over 3 years ago

Blog post: https://araffin.github.io/post/sb3/

Breaking Changes

Upgrade to SB3 >= 1.0
Upgrade to sb3-contrib >= 1.0

New Features

Added 100+ trained agents + benchmark file
Add support for loading saved model under python 3.8+ (no retraining possible)
Added Robotics pre-trained agents (@sgillen)

Bug fixes

Bug fixes for HER handling action noise
Fixed double reset bug with HER and enjoy script

Documentation

Added doc about plotting scripts

Other

Updated HER hyperparameters

rl-baselines3-zoo - Big refactor - SB3 upgrade - Last before v1.0

Published by araffin over 3 years ago

Breaking Changes

Removed LinearNormalActionNoise
Evaluation is now deterministic by default, except for Atari games
sb3_contrib is now required
TimeFeatureWrapper was moved to the contrib repo
Replaced old plot_train.py script with updated plot_training_success.py
Renamed n_episodes_rollout to train_freq tuple to match latest version of SB3