CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
OTHER License
Softlearning is a reinforcement learning framework for training maximum entropy policies in conti...
GLM (General Language Model)
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-f...
[NeurIPS 2024 Datasets and Benchmarks Track] Closed-Loop E2E-AD Benchmark Enhanced by World Model...
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...
🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
Modular Single-file Reinfocement Learning Algorithms Library
minichatgpt - To Train ChatGPT In 5 Minutes
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Train vision models using JAX and 🤗 transformers
Chain-of-Hindsight, A Scalable RLHF Method
An open platform for training, serving, and evaluating large language models. Release repo for Vi...