An offline deep reinforcement learning library
MIT License
Bot releases are visible (Hide)
Published by takuseno almost 4 years ago
Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.
# train policy
from d3rlpy.algos import CQL
from d3rlpy.datasets import get_pybullet
dataset, env = get_pybullet('hopper-bullet-mixed-v0')
cql = CQL()
cql.fit(dataset.episodes)
# Off-Policy Evaluation
from d3rlpy.ope import FQE
from d3rlpy.metrics.scorer import soft_opc_scorer
from d3rlpy.metrics.scorer import initial_state_value_estimation_scorer
fqe = FQE(algo=cql)
fqe.fit(dataset.episodes,
eval_episodes=dataset.episodes
scorers={
'soft_opc': soft_opc_scorer(1000),
'init_value': initial_state_value_estimation_scorer
})
d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous q_func_type
argument was renamed to q_func_factory
.
from d3rlpy.algos import DQN
from d3rlpy.q_functions import QRQFunctionFactory
# initialize Q function factory
q_func_factory = QRQFunctionFactory(n_quantiles=32)
# give it to algorithm object
dqn = DQN(q_func_factory=q_func_factory)
You can pass Q function name as string too.
dqn = DQN(q_func_factory='qr')
You can also make your own Q function factory. Currently, these are the supported Q function factory.
from d3rlpy.algos import DQN
dqn = DQN(encoder_factory='dense')
d3rlpy supports N-step TD calculation for ALL algorithms. You can pass n_steps
arugment to configure this parameters.
from d3rlpy.algos import DQN
dqn = DQN(n_steps=5) # n_steps=1 by default
d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.
The evaluation results will be also available soon.
build_with_dataset
and build_with_env
methods are added to algorithm objectsshuffle
flag is added to fit
method (thanks, @jamartinh )Published by takuseno almost 4 years ago
fit_online
has n_steps
argument instead of n_epochs
for the complete reproduction of the papers.d3rlpy provides more flexible controls for optimizer configuration via OptimizerFactory
.
from d3rlpy.optimizers import AdamFactory
from d3rlpy.algos import DQN
dqn = DQN(optim_factory=AdamFactory(weight_decay=1e-4))
See more at https://d3rlpy.readthedocs.io/en/v0.40/references/optimizers.html .
d3rlpy provides more flexible controls for the neural network architecture via EncoderFactory
.
from d3rlpy.algos import DQN
from d3rlpy.encoders import VectorEncoderFactory
# encoder factory
encoder_factory = VectorEncoderFactory(hidden_units=[300, 400], activation='tanh')
# set OptimizerFactory
dqn = DQN(encoder_factory=encoder_factory)
Also you can build your own encoders.
import torch
import torch.nn as nn
from d3rlpy.encoders import EncoderFactory
# your own neural network
class CustomEncoder(nn.Module):
def __init__(self, obsevation_shape, feature_size):
self.feature_size = feature_size
self.fc1 = nn.Linear(observation_shape[0], 64)
self.fc2 = nn.Linear(64, feature_size)
def forward(self, x):
h = torch.relu(self.fc1(x))
h = torch.relu(self.fc2(h))
return h
# THIS IS IMPORTANT!
def get_feature_size(self):
return self.feature_size
# your own encoder factory
class CustomEncoderFactory(EncoderFactory):
TYPE = 'custom' # this is necessary
def __init__(self, feature_size):
self.feature_size = feature_size
def create(self, observation_shape, action_size=None, discrete_action=False):
return CustomEncoder(observation_shape, self.feature_size)
def get_params(self, deep=False):
return {
'feature_size': self.feature_size
}
dqn = DQN(encoder_factory=CustomEncoderFactory(feature_size=64))
See more at https://d3rlpy.readthedocs.io/en/v0.40/references/network_architectures.html .
fit_online
.
Published by takuseno almost 4 years ago
This version introduces hotfix.
Published by takuseno almost 4 years ago
This version introduces minor changes.
n_epochs
arguments to fit
method.Published by takuseno almost 4 years ago
fit_online
method is available as a convenient alias to d3rlpy.online.iterators.train
function.d3rlpy.dataset
module is now implemented with Cython in order to speed up memory copies.TransitionMiniBatch
objectn_frames
argumentPublished by takuseno about 4 years ago
n_frames
option is added to all algorithms
n_frames
option controls frame stacking for image observationeval_results_
property is added to all algorithms
eval_results_
after training.prev_transition
and next_transition
properties are added to d3rlpy.dataset.Transition
.
Published by takuseno about 4 years ago
Now, the trained policy can be exported as ONNX as well as TorchScript
cql.save_policy('policy.onnx', as_onnx=True)
Published by takuseno about 4 years ago
Published by takuseno about 4 years ago