[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
MIT License
Statistics for this project are still being loaded, please check back later.
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
GLM (General Language Model)
PyTorch codes for "Iterative Token Evaluation and Refinement for Real-World Super-Resolution", AA...
[ACL 2023] Learning Multi-step Reasoning by Solving Arithmetic Tasks. https://arxiv.org/abs/2306....
minichatgpt - To Train ChatGPT In 5 Minutes
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...
MOSS-RLHF
Code and documentation to train Stanford's Alpaca models, and generate the data.
Dromedary: towards helpful, ethical and reliable LLMs.
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
Continual Hyperparameter Selection Framework. Compares 11 state-of-the-art Lifelong Learning meth...
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...