d3rlpy | PyTorch Ecosystem Directory

Bot releases are visible (Hide)

d3rlpy - Release v2.6.0 Latest Release

Published by takuseno about 2 months ago

New Algorithm

ReBRAC has been added to d3rlpy! Please check a reproduction script here.

Enhancement

DeepMind Control support has been added. You can install dependencies by d3rlpy install dm_control. Please check an example script here.
use_layer_norm option has been added to VectorEncoderFactory.

Bugfix

Fix return-to-go calculation for Decision Transformer.
Fix custom model documentation.

d3rlpy - Release v2.5.0

Published by takuseno 5 months ago

New Algorithm

Cal-QL has been added to d3rlpy in v2.5.0! Please check a reproduction script here. To support faithful reproduction, SparseRewardTransitionPicker has been also added, which is used in the reproduction script.

Custom Algorithm Example

One of the frequent questions is "How can I implement a custom algorithm on top of d3rlpy?". Now, the new example script has been added to answer this question. Based on this example, you can build your own algorithm while you can utilize a whole training pipeline provided by d3rlpy. Please check the script here.

Enhancement

Exporting Decision Transformer models as TorchScript and ONNX has been implemented. You can use this feature via save_policy method in the same way as you use with Q-learning algorithms.
Tuple observation support has been added to PyTorch/ONNX export.
Modified return-to-go calculation for Q-learning algorithms and skip this calculation if return-to-go is not necessary.
n_updates option has been added to fit_online method to control update-to-data (UTD) ratio.
write_at_termination option has been added to ReplayBuffer.

Bugfix

Action scaling has been fixed for D4RL datasets.
Default replay buffer creation at fix_online method has been fixed.

d3rlpy - Release v2.4.0

Published by takuseno 8 months ago

Tuple observations

In v2.4.0, d3rlpy supports tuple observations.

import numpy as np
import d3rlpy

observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
    observations=observations,
    actions=actions,
    rewards=rewards,
    terminals=terminals,
)

You can find an example script here

Enhancements

logging_steps and logging_strategy options have been added to fit and fit_online methods (thanks, @claudius-kienle )
Logging with WanDB has been supported. (thanks, @claudius-kienle )
Goal-conditioned envs in Minari have been supported.

Bugfix

Fix errors for distributed training.
OPE documentation has been fixed.

d3rlpy - Release v2.3.0

Published by takuseno 11 months ago

Distributed data parallel training

Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.

Example:

# train.py
from typing import Dict

import d3rlpy


def main() -> None:
    # GPU version:
    # rank = d3rlpy.distributed.init_process_group("nccl")
    rank = d3rlpy.distributed.init_process_group("gloo")
    print(f"Start running on rank={rank}.")

    # GPU version:
    # device = f"cuda:{rank}"
    device = "cpu:0"

    # setup algorithm
    cql = d3rlpy.algos.CQLConfig(
        actor_learning_rate=1e-3,
        critic_learning_rate=1e-3,
        alpha_learning_rate=1e-3,
    ).create(device=device)

    # prepare dataset
    dataset, env = d3rlpy.datasets.get_pendulum()

    # disable logging on rank != 0 workers
    logger_adapter: d3rlpy.logging.LoggerAdapterFactory
    evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
    if rank == 0:
        evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
        logger_adapter = d3rlpy.logging.FileAdapterFactory()
    else:
        evaluators = {}
        logger_adapter = d3rlpy.logging.NoopAdapterFactory()

    # start training
    cql.fit(
        dataset,
        n_steps=10000,
        n_steps_per_epoch=1000,
        evaluators=evaluators,
        logger_adapter=logger_adapter,
        show_progress=rank == 0,
        enable_ddp=True,
    )

    d3rlpy.distributed.destroy_process_group()


if __name__ == "__main__":
    main()

You need to use torchrun command to start training, which should be already installed once you install PyTorch.

$ torchrun \
   --nnodes=1 \
   --nproc_per_node=3 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   train.py

In this case, 3 processes will be launched and start training loop. DecisionTransformer-based algorithms also support this distributed training feature.

The example is also available here

Minari support (thanks, @grahamannett !)

Minari is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.

You can install Minari via d3rlpy CLI.

$ d3rlpy install minari

Example:

import d3rlpy

dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")

iql = d3rlpy.algos.IQLConfig(
    actor_learning_rate=3e-4,
    critic_learning_rate=3e-4,
    batch_size=256,
    weight_temp=10.0,
    max_weight=100.0,
    expectile=0.9,
    reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")

iql.fit(
    dataset,
    n_steps=1000000,
    n_steps_per_epoch=100000,
    evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

Minimize redundant computes

From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as SAC and CQL became extremely faster than the previous version.

Enhancements

GoalConcatWrapper has been added to support goal-conditioned environments.
return_to_go has been added to Transition and TransitionMiniBatch
MixedReplayBuffer has been added to sample two experiences from multiple buffers with arbitrary ratio.
initial_temperature supports 0 at DiscreteSAC.

Bugfix

Getting started page has been fixed.

d3rlpy - Release v2.2.0

Published by takuseno 12 months ago

Algorithm

DiscreteDecisionTransformer, a Decision Transformer implementation for discrete action-space, has been finally implemented in v2.2.0! The reduction results with Atari 2600 are available here.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dt = d3rlpy.algos.DiscreteDecisionTransformerConfig(
    batch_size=64,
    num_heads=1,
    learning_rate=1e-4,
    max_timestep=1000,
    num_layers=3,
    position_encoding_type=d3rlpy.PositionEncodingType.SIMPLE,
    encoder_factory=d3rlpy.models.VectorEncoderFactory([128], exclude_last_activation=True),
    observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
    context_size=20,
    warmup_tokens=100000,
).create()

dt.fit(
    dataset,
    n_steps=100000,
    n_steps_per_epoch=1000,
    eval_env=env,
    eval_target_return=500,
)

Enhancement

Expose action_size and action_space options for manual dataset creation #338
FrameStackTrajectorySlicer has been added.

Refactoring

Typing check of numpy is enabled. Some parts of codes differentiate data types of numpy arrays, which is checked by mypy.

Bugfix

Device error at AWAC #341
Invalid batch.intervals #346
- ⚠️ This fix is important to retain the performance of Q-learning algorithms since v1.1.1.

d3rlpy - Release v2.1.0

Published by takuseno about 1 year ago

Upgrade PyTorch to v2

From this version, d3rlpy requires PyTorch v2 (v1 still may partially work). To do this, the minimum Python version has been bumped to 3.8. This change allows d3rlpy to utilize more advanced features such as torch.compile in the upcoming releases.

Healthcheck

From this version, d3rlpy diagnoses dependency health automatically. In this version, the version of Gym is checked to make sure you have installed the correct version of Gym.

Gymnasium support

d3rlpy now supports Gymnasium as well as Gym. You can use it just same as Gym. Please check example for the further details.

d3rlpy install command

To make your life easier, d3rlpy provides d3rlpy install commands to install additional dependencies. This is the part of d3rlpy CLI. Please check docs for the further details.

$ d3rlpy install atari  # Atari 2600 dependencies
$ d3rlpy install d4rl_atari  # Atari 2600 + d4rl-atari dependencies
$ d3rlpy install d4rl  # D4RL dependencies

Refactoring

In this version, the internal design has been refactored. The algorithm implementation and the way to assign models are mainly refactored. ⚠️ Because of this change, the previously saved models might be incompatible to load in this version.

Enhancement

Added Jupyter Notebook for TPU on Google Colaboratory.
Added d3rlpy.notebook_utils to provide utilities for Jupyter Notebook.
Updated notebook link #313 (thanks @asmith26 !)

Bugfix

Fixed typo docstrings #316 (thanks @asmith26 !)
Fixed docker build #311 (thanks @HassamSheikh !)

d3rlpy - Release v2.0.4

Published by takuseno about 1 year ago

Bugfix

Fix DiscreteCQL loss metrics #298
Fix dump ReplayBuffer #299
Fix InitialStateValueEstimationEvaluator #301
Fix rendering interface to match the latest Gym version #302

To the rendering fix, I recommend you reinstall d4rl-atari if you use it.

$ pip install -U git+https://github.com/takuseno/d4rl-atari

d3rlpy - Release v2.0.3

Published by takuseno over 1 year ago

An emergency patch to fix a bug of predict_value method #297 .

d3rlpy - Release v2.0.2

Published by takuseno over 1 year ago

The major update has been finally released! Since the start of the project, this project has earned almost 1K GitHub stars ⭐ , which is a great milestone of d3rlpy. In this update, there are many major changes.

Upgrade Gym version

From this version, d3rlpy only supports the latest Gym version 0.26.0. This change allows us to support Gymnasium in the future update.

Algorithm

Clear separation between configuration and algorithm

From this version, each algorithm (e.g. "DQN") has a config class (e.g. "DQNConfig"). This allows us to serialize and deserialize algorithms as described later.

dqn = d3rlpy.algos.DQNConfig(learning_rate=3e-4).create(device="cuda:0")

Decision Transformer

Decision Transformer is finally available! You can check reproduction code to see how to use it.

import d3rlpy

dataset, env = d3rlpy.datasets.get_pendulum()

dt = d3rlpy.algos.DecisionTransformerConfig(
    batch_size=64,
    learning_rate=1e-4,
    optim_factory=d3rlpy.models.AdamWFactory(weight_decay=1e-4),
    encoder_factory=d3rlpy.models.VectorEncoderFactory(
        [128],
        exclude_last_activation=True,
    ),
    observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
    reward_scaler=d3rlpy.preprocessing.MultiplyRewardScaler(0.001),
    context_size=20,
    num_heads=1,
    num_layers=3,
    warmup_steps=10000,
    max_timestep=1000,
).create(device="cuda:0")

dt.fit(
    dataset,
    n_steps=100000,
    n_steps_per_epoch=1000,
    save_interval=10,
    eval_env=env,
    eval_target_return=0.0,
)

Serialization

In this version, d3rlpy introduces a compact serialization, d3 format, that includes both hyperparameters and model parameters in a single file. This makes it possible for you to easily save checkpoints and reconstruct algorithms for evaluation and deployment.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dqn = d3rlpy.algos.DQNConfig().create()

dqn.fit(dataset, n_steps=10000)

# save as d3 file
dqn.save("model.d3")

# reconstruct the exactly same DQN
new_dqn = d3rlpy.load_learnable("model.d3")

ReplayBuffer

From this version, there is no clear separation between ReplayBuffer and MDPDataset anymore. Instead, ReplayBuffer has unlimited flexibility to support any kinds of algorithms and experiments. Please check details at documentation.

d3rlpy - Release v1.1.1

Published by takuseno over 2 years ago

Benchmark

The benchmark results of IQL and NFQ have been added to d3rlpy-benchmarks. Plus, the results of the more random seeds up to 10 have been added to all algorithms. The benchmark results are more reliable now.

Documentation

More descriptions have been added to Finetuning tutorial page.
Offline Policy Selection tutorial page has been added

Enhancements

cloudpickle and GPUUtil dependencies have been removed.
gaussian likelihood computation for MOPO becomes more mathematically right (thanks @tominku )

d3rlpy - Release v1.1.0

Published by takuseno over 2 years ago

MDPDataset

The timestep alignment is now exactly the same as D4RL:

# observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))

# actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))

# rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)

# terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...

where r(o, a) is the reward function and t(o, a) is the terminal function.

The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.

Algorithms

Neural Fitted Q-iteration (NFQ)
- https://link.springer.com/chapter/10.1007/11564096_32

Enhancements

AWAC, CRR and IQL use a non-squashed gaussian policy function.
The more tutorial pages have been added to the documentation.
The software design page has been added to the documentation.
The reproduction script for IQL has been added.
The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
The nan checks have been added to MDPDataset.
The target_reduction_type and bootstrap options have been removed.

Bugfix

The unnecessary test conditions have been removed
Typo in dataset.pyx has been fixed #167 (thanks, @zbzhu99 )
The details of IQL implementation have been fixed.

d3rlpy - Release v1.0.0

Published by takuseno almost 3 years ago

It's proud to announce that v1.0.0 has been finally released! The first version was released in Aug 2020 under the support of the IPA MITOU program. At the first release, d3rlpy only supported a few algorithms and did not even support online training. After months of constructive feedbacks and insights from the users and the community, d3rlpy has been established as the first offline deep RL library with many online and offline algorithms support and unique features. The next chapter also starts towards the ambitious v2.0.0 today. Please stay tuned for the next announcement!

NeurIPS 2021 Offline RL Workshop

The workshop paper about d3rlpy has been presented at the NeurIPS 2021 Offline RL Workshop.
URL: https://arxiv.org/abs/2111.03788

Benchmarks

The full benchmark results are finally available at d3rlpy-benchmarks.

Algorithms

Implicit Q-Learning (IQL)
- https://arxiv.org/abs/2110.06169

Enhancements

deterministic option is added to collect method
rollout_return metrics is added to online training
random_steps is added to fit_online method
--save option is added to d3rlpy CLI commands (thanks, @pstansell )
multiplier option is added to reward normalizers
many reproduction scripts are added
policy_type option is added to BC
get_atari_transition function is added for the Atari 2600 offline benchmark procedure

Bugfix

document fix (thanks, @araffin )
Fix TD3+BC's actor loss function
Fix gaussian noise for TD3 exploration

Roadmap towards v2.0.0

Sophisticated config system using dataclasses
Dump configuration and model parameters in a single file
Change MDPDataset format to align with D4RL datasets
Support large dataset
Support tuple observation
Support large-scale data-parallel offline training
Support large-scale distributed online training
Support Transformer architecture (e.g. Decision Transformer)
Speed up training with torch.jit.script and CUDA Graphs
Change library name to represent the unification of offline and online

d3rlpy - Release v0.91

Published by takuseno about 3 years ago

Algorithm

TD3+BC
- https://arxiv.org/abs/2106.06860

RewardScaler

From this version, the preprocessors are available for the rewards, which allow you to normalize, standardize and clip the reward values.

import d3rlpy

# normalize
cql = d3rlpy.algos.CQL(reward_scaler="min_max")

# standardize
cql = d3rlpy.algos.CQL(reward_scaler="standardize")

# clip (you can't use string alias)
cql = d3rlpy.algos.CQL(reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0))

copy_policy_from and copy_q_function_from methods

In the scenario of finetuning, you might want to initialize SAC's policy function with the pretrained CQL's policy function to boost the initial performance. From this version, you can do that as follows:

import d3rlpy

# pretrain with static dataset
cql = d3rlpy.algos.CQL()
cql.fit(...)

# transfer the policy function
sac = d3rlpy.algos.SAC()
sac.copy_policy_from(cql)

# you can also transfer the Q-function
sac.copy_q_function_from(cql)

# finetuning with online algorithm
sac.fit_online(...)

Enhancements

show messages for skipping model builds
add alpha parameter option to DiscreteCQL
keep counting the number of gradient steps
allow expanding MDPDataset with the larger discrete actions (thanks, @jamartinh )
callback function is called every gradient step (previously, it's called every epoch)

Bugfix

FQE's loss function has been fixed (thanks for the report, @guyk1971)
fix documentation build (thanks, @astrojuanlu)
fix d4rl dataset conversion for MDPDataset (this will have a significant impact on the performance for d4rl dataset)

d3rlpy - Release v0.90

Published by takuseno over 3 years ago

Algorithm

Conservative Offline Model-Based Optimization (COMBO)
- https://arxiv.org/abs/2102.08363

Drop data augmentation feature

From this version, the data augmentation feature has been dropped. The reason for this is that the feature introduces a lot of code complexity. In order to make d3rlpy support many algorithms and keep it as simple as possible, the feature was dropped. Instead, TorchMiniBatch was internally introduced, and all algorithms become more simple.

collect method

In offline RL experiments, data collection plays an important role especially when you try new tasks.
From this version, collect method is finally available.

import d3rlpy
import gym

# prepare environment
env = gym.make('Pendulum-v0')

# prepare algorithm
sac = d3rlpy.algos.SAC()

# prepare replay buffer
buffer = d3rlpy.online.buffers.ReplayBuffer(maxlen=100000, env=env)

# start data collection without updates
sac.collect(env, buffer)

# export to MDPDataset
dataset = buffer.to_mdp_dataset()

# save as file
dataset.dump('pendulum.h5')

Along with this change, random policies are also introduced. These are useful to collect dataset with random policy.

# continuous action-space
policy = d3rlpy.algos.RandomPolicy()

# discrete action-space
policy = d3rlpy.algos.DiscreteRandomPolicy()

Enhancements

CQL and BEAR become closer to the official implementations
callback argument has been added to algorithms
random dataset has been added to cartpole and pendulum dataset
- you can specify it via dataset_type='random' at get_cartpole and get_pendulum method

Bugfix

fix action normalization at predict_value method (thanks, @navidmdn )
fix seed settings at reproduction codes

What's missing before v1.00?

Currently, I'm benchmarking all algorithms with d4rl dataset. Through the experiments, I realized that it's very difficult to reproduce the table reported in the paper because they actually didn't reveal full hyper-parameters, which are tuned to each dataset. So I gave up reproducing the table, and start producing numbers with the official codes to see if d3rlpy's result matches.

d3rlpy - Release v0.80

Published by takuseno over 3 years ago

Algorithms

New algorithms are introduced in this version.

Critic Regularized Regression (CRR)
- https://arxiv.org/abs/2006.15134
Model-based Offline Policy Optimization (MOPO)
- https://arxiv.org/abs/2005.13239

Model-based RL

Previously, model-based RL has been supported. The model-based specific logic was implemented in dynamics side. This approach enabled us to combine model-based algorithms with arbitrary model-free algorithms. However, this requires complex designs to implement the recent model-based RL. So, the dynamics interface was refactored and the MOPO is the first algorithm to show how d3rlpy supports model-based RL algorithms.

# train dynamics model
from d3rlpy.datasets import get_pendulum
from d3rlpy.dynamics import ProbabilisticEnsembleDynamics
from d3rlpy.metrics.scorer import dynamics_observation_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_reward_prediction_error_scorer
from d3rlpy.metrics.scorer import dynamics_prediction_variance_scorer
from sklearn.model_selection import train_test_split

dataset, _ = get_pendulum()

train_episodes, test_episodes = train_test_split(dataset)

dynamics = d3rlpy.dynamics.ProbabilisticEnsembleDynamics(learning_rate=1e-4, use_gpu=True)

dynamics.fit(train_episodes,
             eval_episodes=test_episodes,
             n_epochs=100,
             scorers={
                'observation_error': dynamics_observation_prediction_error_scorer,
                'reward_error': dynamics_reward_prediction_error_scorer,
                'variance': dynamics_prediction_variance_scorer,
             })

# train Model-based RL algorithm
from d3rlpy.algos import MOPO

# give mopo as generator argument.
mopo = MOPO(dynamics=dynamics)

mopo.fit(dataset, n_steps=100000)

enhancements

fitter method has been implemented (thanks @jamartinh )
tensorboard_dir repleces tensorboard flag at fit method (thanks @navidmdn )
show warning messages when the unused arguments are passed
show comprehensive error messages when action-space is not compatible
fit method accepts MDPDataset object
dropout option has been implemented in encoders
add appropriate __repr__ methods to show pretty outputs when print(algo)
metrics collection is refactored

bugfix

fix core dumped errors by fixing numpy version
fix CQL backup

d3rlpy - Release v0.70

Published by takuseno over 3 years ago

Command Line Interface

New commands are added in this version.

record

You can record the video of the evaluation episodes without coding anything.

$ d3rlpy record d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0

# record wrapped environment
$ d3rlpy record d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
    --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'

play

You can run the evaluation episodes with rendering images.

# record simple environment
$ d3rlpy play d3rlpy_logs/CQL_20201224224314/model_100.pt --env-id HopperBulletEnv-v0

# record wrapped environment
$ d3rlpy play d3rlpy_logs/Discrete_CQL_20201224224314/model_100.pt \
    --env-header 'import gym; env = d3rlpy.envs.Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'

data-point mask for bootstrapping

Ensemble training for Q-functions has been shown as a powerful method to achieve robust training. Previously, bootstrap option has been available for algorithms. But, the mask for Q-function loss is randomly created every time when the batch is sampled.

In this version, create_mask option is available for MDPDataset and ReplayBuffer, which will create a unique mask at each data-point.

# offline training
dataset = d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, create_mask=True, mask_size=5)
cql = d3rlpy.algos.CQL(n_critics=5, bootstrap=True, target_reduction_type='none')
cql.fit(dataset)

# online training
buffer = d3rlpy.online.buffers.ReplayBuffer(1000000, create_mask=True, mask_size=5)
sac = d3rlpy.algos.SAC(n_critics=5, bootstrap=True, target_reduction_type='none')
sac.fit_online(env, buffer)

As you noticed above, target_reduction_type is newly introduced to specify how to aggregate target Q values. In the standard Soft Actor-Critic, the target_reduction_type='min'. If you choose none, each ensemble Q-function uses its own target value, which is similar to what Bootstrapped DQN does.

better module access

From this version, you can navigate to all modules through d3rlpy.

# previously
from d3rlpy.datasets import get_cartpole
dataset = get_cartpole()

# v0.70
import d3rlpy
dataset = d3rlpy.datasets.get_cartpole()

new logger style

From this version, structlog is internally used to print information instead of raw print function. This allows us to emit more structural information. Furthermore, you can control what to show and what to save to the file if you overwrite logger configuration.

enhancements

soft_q_backup option is added to CQL.
Paper Reproduction page has been added to the documentation in order to show the performance with the paper configuration.
commit method at D3RLPyLogger returns metrics (thanks, @jamartinh )

bugfix

fix epoch count in offline training.
fix total_step count in online training.
fix typos at documentation (thanks, @pstansell )

d3rlpy - Release v0.61

Published by takuseno over 3 years ago

CLI

record command is newly introduced in this version. You can record videos of evaluation episodes with the saved model.

$ d3rlpy record d3rlpy_logs/CQL_20210131144357/model_100.pt --env-id Hopper-v2

You can also use the wrapped environment.

$ d3rlpy record d3rlpy_logs/DQN_online_20210130170041/model_1000.pt \
  --env-header 'import gym; from d3rlpy.envs import Atari; env = Atari(gym.make("BreakoutNoFrameskip-v4"), is_eval=True)'

bugfix

fix saving models every step in fit_online method
fix Atari wrapper to reproduce the paper result
fix CQL and BEAR algorithms

d3rlpy - Release v0.60

Published by takuseno over 3 years ago

logo

New logo images are made for d3rlpy 🎉

standard	inverted

ActionScaler

ActionScaler provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]. From now on, you don't need to care about the range of actions.

from d3rlpy.cql import CQL

cql = CQL(action_scaler='min_max')  # just pass action_scaler argument

handling timeout episodes

Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.

from d3rlpy.dataset import MDPDataset

observations = ...
actions = ...
rewards = ...
terminals = ... # this indicates the environmental termination
episode_terminals = ... # this indicates episode boundaries

datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)

# if episode_terminals are omitted, terminals will be used to specify episode boundaries
# datasets = MDPDataset(observations, actions, rewards, terminals)

In online training, you can specify this option via timelimit_aware flag.

from d3rlpy.sac import SAC

env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit

sac = SAC()
sac.fit_online(env, timelimit_aware=True) # this flag is True by default

reference: https://arxiv.org/abs/1712.00378

batch online training

When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.

from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv

if __name__ == '__main__':  # this is necessary if you use AsyncBatchEnv
    env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)])  # distributing 10 environments in different processes

    sac = SAC(use_gpu=True)
    sac.fit_batch_online(env) # train with 10 environments concurrently

docker image

Pre-built d3rlpy docker image is available in DockerHub.

$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash

enhancements

BEAR algorithm is updated based on the official implementation
- new mmd_kernel option is available
to_mdp_dataset method is added to ReplayBuffer
ConstantEpsilonGreedy explorer is added
d3rlpy.envs.ChannelFirst wrapper is added (thanks for reporting, @feyza-droid )
new dataset utility function d3rlpy.datasets.get_d4rl is added
- this is handling timeouts inside the function
offline RL paper reproduction codes are added
smoothed moving average plot at d3rlpy plot CLI function (thanks, @pstansell )
user-friendly messages for assertion errors
better memory consumption
save_interval argument is added to fit_online

bugfix

core dumps are fixed in Google Colaboratory tutorials
typos in some documentations (thanks for reporting, @pstansell )

d3rlpy - Release v0.51

Published by takuseno almost 4 years ago

minor fix

add typing-extensions depdency
update MANIFEST.in

d3rlpy - Release v0.50

Published by takuseno almost 4 years ago

typing

Now, d3rlpy is fully type-annotated not only for the better use of this library but also for the better contribution experiences.

mypy and pylint check the type consistency and code quality.
due to a lot of changes to add type annotations, there might be degradation that is not detected by linters.

CLI

v0.50 introduces the new command-line interface, d3rlpy command that helps you to do more without any efforts. For now, d3rlpy provides the following commands.

# plot CSV data
$ d3rlpy plot d3rlpy_logs/XXX/YYY.csv

# plot CSV data
$ d3rlpy plot-all d3rlpy_logs/XXX

# export the save model as inference formats (e.g. ONNX, TorchScript)
$ d3rlpy export d3rlpy_logs/XXX/model_YYY.pt