Benchmark-Efficient-Reinforcement-Learning-with-Demonstrations

Benchmark present methods for efficient reinforcement learning. Methods include Reptile, MAML, Residual Policy, etc. RL algorithms include DDPG, PPO.

APACHE-2.0 License

Stars

26

Committers

View Code on GitHub

Ecosystems: Python

Benchmark Efficient Reinforcement Learning with/without Demonstrations

Efficient reinforcement learning for robotics control in simulation (Reacher Environment).

Compare present methods for more efficient and robust RL training, including:

Better Initialization (pre-trained with supervised learning);
Residual Policy Learning;
DDPG from Demonstrations;
Reptile, MAML (across tasks);

Project Development: gif

To Run:

Python 3.5
Pytorch & Tensorflow

Document:

Benchmarks:

DDPG, PPO for Reacher Environment in simulation
Inverse Kinematics of Reacher Environment
Supervised learning for intialization of DDPG, PPO
Residual policy learning for intialization of DDPG, PPO
Reptile + PPO, MAML + PPO
DDPG from demonstrations

Contents:

Basics:

./origin_env_code:

Basic codes for Reacher environment and inverse kinematics.

./DDPG4Reacher, ./DDPG4Reacher2:

DDPG algorithm for Reacher environment.

./Inverse:

Inverse kinematics for generating demonstrations data.

Efficient RL:

Policy Replacement (Behavior Cloning) in ./DDPG_Inverse:

Train an initialization policy for RL (DDPG) via supervised learning with samples generated from inverse kinematics (already generated).

Feeding Demonstrations into Memory Buffer in ./DDPGfD:

DDPGfD codes are training DDPG to learn from demonstrations, feeding demonstration trajectories directly into memory (a separate one) for training.

Residual Policy Learning in ./RPL_DDPG_new/:

Train a residual policy with RL (DDPG) on top of a pre-trained initialization policy via supervised learning with samples generated from inverse kinematics (already generated).

Meta-Learning and Policy Replacement with PPO in ./PPO:

Implementations of PPO algorithm with Reacher environment, including PPO for Reacher of 2/3 joints;
PPO with initialized policy;
PPO+Reptile;
PPO+MAML;
PPO+FOMAML (first-order MAML), etc.

Meta-Learning in ./MAML:

Comparison:

./Comparison:

Comparison of different methods with demonstrations for efficient reinforcement learning (DDPG), including policy replacement (./Comparison/DDPGini/), residual policy learning (./Comparison/DDPGres/), directly feeding demonstrations (demonstration ratio: 0.5, (./Comparison/DDPGfD/)) into the buffer and vanilla DDPG.

Dense Reward:

Sparse Reward:

Related Projects

pfrl

PFRL: a PyTorch-based deep reinforcement learning library

24 Jun 2020 1,182

deep-rl-toolkit

RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementati...

rllab

rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compa...

21 Apr 2016 2,868

genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementat...

26 Mar 2020 403

RLToolkit

RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementati...

DeepRL-Tutorials

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

31 May 2018 1,053

DQN-tensorflow

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

15 May 2016 2,475

pic

Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcem...

reinforcement-learning

Minimal and Clean Reinforcement Learning Examples

13 Jan 2017 3,354

ParamNoise

A comparison of parameter space noise methods for exploration in deep reinforcement learning

Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments

07 Sep 2018 5,584

rl-agents

Implementations of Reinforcement Learning and Planning algorithms

05 Jun 2017 512

pytorch-opensim-rl

PyTorch based Reinforcement Learning for OpenSim Prosthetics and Learning to Run environments

rl-examples

Examples of published reinforcement learning algorithms in recent literature implemented in Tenso...

AI-Optimizer

The next generation deep reinforcement learning tookit

11 Mar 2022 4,622