Soft Actor-Critic implementation with SOTA model-free extension (REDQ) and SOTA model-based extension (MBPO).
MIT License
Statistics for this project are still being loaded, please check back later.
Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasi...
A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementat...
Minimal and Clean Reinforcement Learning Examples
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...
Collection of reinforcement learning algorithms
SBX: Stable Baselines Jax (SB3 + Jax)
Softlearning is a reinforcement learning framework for training maximum entropy policies in conti...
Implementations of Reinforcement Learning and Planning algorithms
Code for the paper "Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online ...
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
Implementation of the Llama architecture with RLHF + Q-learning
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcem...
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforceme...
Code for the paper Optimistic Linear Support and Successor Features as a Basis for Optimal Policy...
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch