d3rlpy

An offline deep reinforcement learning library

MIT License

Downloads
9.4K
Stars
1.3K
Committers
15

Bot releases are visible (Hide)

d3rlpy - Release v0.41

Published by takuseno almost 4 years ago

Algorithm

Off-Policy Evaluation

Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.

# train policy
from d3rlpy.algos import CQL
from d3rlpy.datasets import get_pybullet
dataset, env = get_pybullet('hopper-bullet-mixed-v0')
cql = CQL()
cql.fit(dataset.episodes)

# Off-Policy Evaluation
from d3rlpy.ope import FQE
from d3rlpy.metrics.scorer import soft_opc_scorer
from d3rlpy.metrics.scorer import initial_state_value_estimation_scorer
fqe = FQE(algo=cql)
fqe.fit(dataset.episodes,
           eval_episodes=dataset.episodes
           scorers={
               'soft_opc': soft_opc_scorer(1000),
               'init_value': initial_state_value_estimation_scorer
           })

Q Function Factory

d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous q_func_type argument was renamed to q_func_factory.

from d3rlpy.algos import DQN
from d3rlpy.q_functions import QRQFunctionFactory

# initialize Q function factory
q_func_factory = QRQFunctionFactory(n_quantiles=32)

# give it to algorithm object
dqn = DQN(q_func_factory=q_func_factory)

You can pass Q function name as string too.

dqn = DQN(q_func_factory='qr')

You can also make your own Q function factory. Currently, these are the supported Q function factory.

EncoderFactory

from d3rlpy.algos import DQN

dqn = DQN(encoder_factory='dense')

N-step TD calculation

d3rlpy supports N-step TD calculation for ALL algorithms. You can pass n_steps arugment to configure this parameters.

from d3rlpy.algos import DQN

dqn = DQN(n_steps=5) # n_steps=1 by default

Paper reproduction scripts

d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.

The evaluation results will be also available soon.

enhancements

  • build_with_dataset and build_with_env methods are added to algorithm objects
  • shuffle flag is added to fit method (thanks, @jamartinh )
d3rlpy - Release v0.40

Published by takuseno almost 4 years ago

Algorithms

  • Support the discrete version of Soft Actor-Critic
  • fit_online has n_steps argument instead of n_epochs for the complete reproduction of the papers.

OptimizerFactory

d3rlpy provides more flexible controls for optimizer configuration via OptimizerFactory.

from d3rlpy.optimizers import AdamFactory
from d3rlpy.algos import DQN

dqn = DQN(optim_factory=AdamFactory(weight_decay=1e-4))

See more at https://d3rlpy.readthedocs.io/en/v0.40/references/optimizers.html .

EncoderFactory

d3rlpy provides more flexible controls for the neural network architecture via EncoderFactory.

from d3rlpy.algos import DQN
from d3rlpy.encoders import VectorEncoderFactory

# encoder factory
encoder_factory = VectorEncoderFactory(hidden_units=[300, 400], activation='tanh')

# set OptimizerFactory
dqn = DQN(encoder_factory=encoder_factory)

Also you can build your own encoders.

import torch
import torch.nn as nn

from d3rlpy.encoders import EncoderFactory

# your own neural network
class CustomEncoder(nn.Module):
    def __init__(self, obsevation_shape, feature_size):
        self.feature_size = feature_size
        self.fc1 = nn.Linear(observation_shape[0], 64)
        self.fc2 = nn.Linear(64, feature_size)

    def forward(self, x):
        h = torch.relu(self.fc1(x))
        h = torch.relu(self.fc2(h))
        return h

    # THIS IS IMPORTANT!
    def get_feature_size(self):
        return self.feature_size

# your own encoder factory
class CustomEncoderFactory(EncoderFactory):
    TYPE = 'custom' # this is necessary

    def __init__(self, feature_size):
        self.feature_size = feature_size

    def create(self, observation_shape, action_size=None, discrete_action=False):
        return CustomEncoder(observation_shape, self.feature_size)

    def get_params(self, deep=False):
        return {
            'feature_size': self.feature_size
        }

dqn = DQN(encoder_factory=CustomEncoderFactory(feature_size=64))

See more at https://d3rlpy.readthedocs.io/en/v0.40/references/network_architectures.html .

Stable Baselines 3 wrapper

bugfix

  • fix the memory leak problem at fit_online.
    • Now, you can train online algorithms with the big replay buffer size for the image observation.
  • fix preprocessing at CQL.
  • fix ColorJitter augmentation.

installation

PyPi

  • From this version, d3rlpy officially supports Windows.
  • The binary packages for each platform are built in GitHub Actions. And they are uploaded, which means that you don't have to install Cython to install this package from PyPi.

Anaconda

  • From previous version, d3rlpy is available in conda-forge.
d3rlpy - Release v0.32

Published by takuseno almost 4 years ago

This version introduces hotfix.

  • ⚠️ Fix the significant bug in the case of online training with image observation.
d3rlpy - Release v0.31

Published by takuseno almost 4 years ago

This version introduces minor changes.

  • Move n_epochs arguments to fit method.
  • Fix scikit-learn compatibility issues.
  • Fix zero-division error during online training.
d3rlpy - Release version v0.30

Published by takuseno almost 4 years ago

Algorithm

  • Support Advantage-Weighted Actor-Critic (AWAC)
  • fit_online method is available as a convenient alias to d3rlpy.online.iterators.train function.
  • unnormalizing action problem is fixed at AWR.

Metrics

⚠️ MDPDataset

  • d3rlpy.dataset module is now implemented with Cython in order to speed up memory copies.
  • Following operations are significantly faster than the previous version.
    • creating TransitionMiniBatch object
    • frame stacking via n_frames argument
    • lambda return calculation at AWR algorithms
  • This change approximately makes Atari training 6% faster.
d3rlpy - Release version v0.23

Published by takuseno about 4 years ago

Algorithm

  • Support Advantage-Weighted Regression (AWR)
  • n_frames option is added to all algorithms
    • n_frames option controls frame stacking for image observation
  • eval_results_ property is added to all algorithms
    • evaluation results can be retrieved from eval_results_ after training.

MDPDataset

  • prev_transition and next_transition properties are added to d3rlpy.dataset.Transition.
    • these properties are used for frame stacking and Monte-Carlo returns calculation at AWR.

Document

  • new tutorial page is added
d3rlpy - Release version v0.22

Published by takuseno about 4 years ago

Support ONNX export

Now, the trained policy can be exported as ONNX as well as TorchScript

cql.save_policy('policy.onnx', as_onnx=True)

Support more data augmentations

  • data augmentations for vector obsrevation
  • ColorJitter augmentation for image observation
d3rlpy - Release version v0.2

Published by takuseno about 4 years ago

  • support model-based algorithm
    • Model-based Offline Policy Optimization
  • support data augmentation (for image observation)
    • Data-reguralized Q-learning
  • a lot of improvements
    • more dataset statistics
    • more options to customize neural network architecture
    • optimize default learning rates
    • etc
d3rlpy - First release!

Published by takuseno about 4 years ago

  • online algorithms
    • Deep Q-Network (DQN)
    • Double DQN
    • Deep Deterministic Policy Gradients (DDPG)
    • Twin Delayed Deep Deterministic Policy Gradients (TD3)
    • Soft Actor-Critic (SAC)
  • data-driven algorithms
    • Batch-Constrained Q-leearning (BCQ)
    • Bootstrapping Error Accumulation Reduction (BEAR)
    • Conservative Q-Learning (CQL)
  • Q functions
    • mean
    • Quantile Regression
    • Implicit Quantile Network
    • Fully-parametrized Quantile Function (experimental)
Package Rankings
Top 2.99% on Pypi.org
Top 28.13% on Conda-forge.org
Top 6.69% on Proxy.golang.org
Badges
Extracted from project README
Documentation Status codecov Maintainability PyPI version Anaconda-Server Badge Anaconda-Server Badge Open In Colab Open In Colab