Tensorforce: a TensorFlow library for applied reinforcement learning
APACHE-2.0 License
Bot releases are hidden (Show)
reward_preprocessing
to reward_processing
, and in case of Tensorforce agent moved to reward_estimation[reward_processing]
categorical
distribution argument skip_linear
to not add the implicit linear logits layerEnvironment.num_actors()
Runner
uses multi-actor parallelism by default if environment is multi-actorEnvironment
function episode_return()
which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner displayvectorized_environment.py
and multiactor_environment.py
script to illustrate how to setup a vectorized/multi-actor environment.Published by AlexKuhnle over 3 years ago
update_frequency
/ update[frequency]
now supports float values > 0.0, which specify the update-frequency relative to the batch-sizeupdate_frequency
from 1.0
to 0.25
for DQN, DoubleDQN, DuelingDQN agentsreturn_processing
and advantage_processing
(where applicable) for all agent sub-typesAgent.get_specification()
which returns the agent specification as dictionaryAgent.get_architecture()
which returns a string representation of the network layer architecturenetwork=my_module
instead of network=my_module.TestNetwork
, or environment=envs.custom_env
instead of environment=envs.custom_env.CustomEnvironment
(module file needs to be in the same directory or a sub-directory)single_output=True
for some policy types which, if False
, allows the specification of additional network outputs for some/all actions via registered tensorsKerasNetwork
argument model
now supports arbitrary functions as long as they return a tf.keras.Model
SelfAttention
(specification key: self_attention
)episode_rewards
as episode_returns
, and TQDM status reward
as return
agent
to support Agent.load()
keyword arguments to load an existing agent instead of creating a new one.action_masking.py
example script to illustrate an environment implementation with built-in action masking.Published by AlexKuhnle over 3 years ago
tracking
and corresponding function tracked_tensors()
to track and retrieve the current value of predefined tensors, similar to summarizer
for TensorBoard summariestrace_decay
and gae_decay
for Tensorforce agent argument reward_estimation
, soon for other agent types as well"early"
and "late"
for value estimate_advantage
of Tensorforce agent argument reward_estimation
Agent.act()
argument deterministic
from False
to True
KerasNetwork
(specification key: keras
) as wrapper for networks specified as Keras modelKerasNetwork
Gaussian
distribution argument global_stddev=False
to stddev_mode='predicted'
Categorical
distribution argument temperature_mode=None
Function
layer argument function
to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"episode-length
recorded as part of summary label "reward"Environment.is_vectorizable()
and new argument num_parallel
for Environment.reset()
tensorforce/environments.cartpole.py
for a vectorizable environment exampleRunner
uses vectorized parallelism by default if num_parallel > 1
, remote=None
and environment supports vectorizationexamples/act_observe_vectorized.py
for more details on act-observe interactioncustom_cartpole
(work in progress)reward_shaping
to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression--checkpoints
and --summaries
to add comma-separated checkpoint/summary filename in addition to directoryPublished by AlexKuhnle about 4 years ago
Published by AlexKuhnle about 4 years ago
"adam"
for Tensorforce agent argument optimizer
(since default optimizer argument learning_rate
removed, see below)"minimum"
for Tensorforce agent argument memory
, use None
insteaddqn
/double_dqn
/dueling_dqn
agent argument huber_loss
from 0.0
to None
0.999
for exponential_normalization
layer argument decay
batch_normalization
(generally should only be used for the agent arguments reward_processing[return_processing]
and reward_processing[advantage_processing]
)exponential/instance_normalization
layer argument only_mean
with default False
exponential/instance_normalization
layer argument min_variance
with default 1e-4
1e-3
for optimizer argument learning_rate
gradient_norm_clipping
from 1.0
to None
(no gradient clipping)doublecheck_step
and corresponding argument doublecheck_update
for optimizer wrapperlinesearch_step
optimizer argument accept_ratio
natural_gradient
optimizer argument return_improvement_estimate
saver
as string, which is interpreted as saver[directory]
with otherwise default valuessaver[frequency]
as 10
(save model every 10 updates by default)saver[max_checkpoints]
from 5
to 10
summarizer
as string, which is interpreted as summarizer[directory]
with otherwise default valuessummarizer
from summarizer[labels]
to summarizer[summaries]
(use of the term "label" due to earlier version, outdated and confusing by now)summarizer[summaries] = "all"
to include only numerical summaries, so all summaries except "graph"summarizer[summaries]
from ["graph"]
to "all"
summarizer[max_summaries]
from 5
to 7
(number of different colors in TensorBoard)summarizer[filename]
to agent argument summarizer
recorder
as string, which is interpreted as recorder[directory]
with otherwise default values--checkpoints
/--summaries
/--recordings
command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configurationsave_load_agent.py
example script to illustrate regular agent saving and loadinggradient_norm_clipping
not being applied correctlyexponential_normalization
layer not updating moving mean and variance correctlyrecent
memory for timestep-based updates sometimes sampling invalid memory indicesPublished by AlexKuhnle about 4 years ago
execution
, buffer_observe
, seed
baseline_policy
/baseline_network
/critic_network
to baseline
/critic
reward_estimation
arguments estimate_horizon
to predict_horizon_values
, estimate_actions
to predict_action_values
, estimate_terminal
to predict_terminal_values
preprocessing
to state_preprocessing
linear_normalization
preprocessing
to reward_preprocessing
and reward_estimation[return_/advantage_processing]
config
with values buffer_observe
, enable_int_action_masking
, seed
critic_network
/_optimizer
to baseline
/baseline_optimizer
optimization_steps
to multi_step
subsampling_fraction
use_beta_distribution
default to falsedouble_dqn
)Agent.act()
argument evaluation
query
(functionality removed)save
/load
functions and saver
argument changedsaver
is not to load agent, unless agent is created via Agent.load
summarizer
argument changed, some summary labels and other options removedinternal_{rnn/lstm/gru}
to rnn/lstm/gru
and rnn/lstm/gru
to input_{rnn/lstm/gru}
auto
network argument internal_rnn
to rnn
(internal_)rnn/lstm/gru
layer argument length
to horizon
update_modifier_wrapper
to optimizer_wrapper
optimizing_step
to linesearch_step
, and UpdateModifierWrapper
argument optimizing_iterations
to linesearch_iterations
subsampling_step
accepts both absolute (int) and relative (float) fractionspolicy_gradient
argument ratio_based
renamed to importance_sampling
state_value
and action_value
Gaussian
distribution arguments global_stddev
and bounded_transform
(for improved bounded action space handling)device
argument to CPU:0
Agent.create()
accepts act-function as agent
argument for recordingparametrized_distributions
, new default policies parametrized_state/action_value
long
and int
typeEnvironmentWrapper
classtune.py
argumentsPublished by AlexKuhnle over 4 years ago
agent.act
to use final values of dynamic hyperparameters and avoid TensorFlow conditions"tensorflow"
format of agent.save
to include an optimized Protobuf model with an act-only graph as .pb
file, and Agent.load
format "pb-actonly"
to load act-only agent based on Protobuf modelsummarizer
argument value custom
to specify summary type, and Agent.summarize(...)
to record summary valuesbatch_size
now mandatory for all agent classesEstimator
argument capacity
, now always automatically inferredmemory
, update
and reward_estimation
bias
and activation
argument of some layerssequence
preprocessorint
actions onlyuse_beta_distribution
argument with default True
to many agents and ParametrizedDistributions
policy, so default can be changedPublished by AlexKuhnle over 4 years ago
memory
now required to be specified explicitly, plus update_frequency
default changedconv1d/conv2d_transpose
layers due to TensorFlow gradient problemsAgent
, Environment
and Runner
can now be imported via from tensorforce import ...
reshape
Agent.act
and Agent.observe
multiprocessing
and socket
(replacing tensorforce/contrib/socket_remote_env/
and tensorforce/environments/environment_process_wrapper.py
), available via Environment.create(...)
, Runner(...)
and run.py
ParallelRunner
and merged functionality with Runner
run.py
argumentsAgent.act
: additional argument internals
and corresponding return value, initial internals via Agent.initial_internals()
, Agent.reset()
not required anymoredeterministic
argument for Agent.act
unless independent modeformat
argument to save
/load
/restore
with supported formats tensorflow
, numpy
and hdf5
save
argument append_timestep
to append
with default None
(instead of 'timesteps'
)get_variable
and assign_variable
agent functionsPublished by AlexKuhnle almost 5 years ago
memory
argument to various agents"entropy"
and "kl-divergence"
linear
layer now accepts tensors of rank 1 to 3conv1d/2d_transpose
)tensorforce/contrib/
save_best_agent
argument to specify best model directory different from saver
configurationsaver
argument steps
removed and seconds
renamed to frequency
Parallel/Runner
argument max_episode_timesteps
from run(...)
to constructorEnvironment.create(...)
argument max_episode_timesteps
graph
, variables
and variables-histogram
temporarily not workingtarget_sync_frequency
from timesteps to updates for dqn
and dueling_dqn
agentPublished by AlexKuhnle about 5 years ago
updates
and renamed timesteps
/episodes
counter for agents and runnerscritic_{network,optimizer}
argument to baseline_{network,optimizer}
ac
), Advantage Actor-Critic (a2c
) and Dueling DQN (dueling_dqn
) agentsblock
) for easier sharing of layer blocksPolicyAgent/-Model
to TensorforceAgent/-Model
Agent.load(...)
function, saving includes agent specificationPolicyAgent
argument (baseline-)network
temperature
"same"
and "equal"
options for baseline_*
arguments and changed internal baseline handlingstate/action_value
to value
objective with argument value
either "state"
or "action"
Published by AlexKuhnle about 5 years ago
Published by AlexKuhnle about 5 years ago
agent.initialize()
before applicationint
require an entry num_values
(instead of num_actions
)Agent.from_spec()
changed and renamed to Agent.create()
Agent.act()
argument fetch_tensors
changed and renamed to query
, index
renamed to parallel
, buffered
removedAgent.observe()
argument index
renamed to parallel
Agent.atomic_observe()
removedAgent.save/restore_model()
renamed to Agent.save/restore()
update_mode
renamed to update
states_preprocessing
and reward_preprocessing
changed and combined to preprocessing
actions_exploration
changed and renamed to exploration
execution
entry num_parallel
replaced by a separate argument parallel_interactions
batched_observe
and batching_capacity
replaced by argument buffer_observe
scope
renamed to name
update_mode
replaced by batch_size
, update_frequency
and start_updating
optimizer
removed, implicitly defined as 'adam'
, learning_rate
addedmemory
defines capacity of implicitly defined memory 'replay'
double_q_model
removed (temporarily)max_episode_timesteps
update_mode
replaced by batch_size
and update_frequency
memory
removedbaseline_mode
removedbaseline
argument changed and renamed to critic_network
baseline_optimizer
renamed to critic_optimizer
gae_lambda
removed (temporarily)step_optimizer
removed, implicitly defined as 'adam'
, learning_rate
addedcg_*
and ls_*
arguments removedoptimizer
removed, implicitly defined as 'adam'
, learning_rate
addedstates
and actions
are now functions states()
and actions()
int
require an entry num_values
(instead of num_actions
)Environment.max_episode_timesteps()
tensorforce.environments
run()
API for Runner
and ParallelRunner
ThreadedRunner
removedexamples
folder (including configs
) removed, apart from quickstart.py
benchmarks
folder to replace parts of old examples
folderPublished by AlexKuhnle about 5 years ago