The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
OTHER License
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
[ICCV 2021] Official implementation of "The Surprising Effectiveness of Visual Odometry Technique...
Chain-of-Hindsight, A Scalable RLHF Method
A library for differentiable robotics.
A modular RL library to fine-tune language models to human preferences
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-f...
MARLToolkit: The Multi-Agent Rainforcement Learning Toolkit. Include implementation of MAPPO, MAD...
Train vision models using JAX and 🤗 transformers
Modular Single-file Reinfocement Learning Algorithms Library
This is the official implementation of Multi-Agent PPO (MAPPO).
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Softlearning is a reinforcement learning framework for training maximum entropy policies in conti...
[ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning fr...