DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

MIT License

Stars

View Code on GitHub Visit Website

Ecosystems: Python

Statistics for this project are still being loaded, please check back later.

Related Projects

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

GLM

GLM (General Language Model)

18 Mar 2021 3,170

ITER

PyTorch codes for "Iterative Token Evaluation and Refinement for Real-World Super-Resolution", AA...

10 Dec 2023 47

MsAT

[ACL 2023] Learning Multi-step Reasoning by Solving Arithmetic Tasks. https://arxiv.org/abs/2306....

02 Jun 2023 4

minichatgpt

minichatgpt - To Train ChatGPT In 5 Minutes

23 Feb 2023 155

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

17 May 2023 4,623

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...

29 Jan 2022 2,401

MOSS-RLHF

05 Jul 2023 1,274

open-instruct

09 Jun 2023 1,214

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

10 Mar 2023 28,912

Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.

03 May 2023 1,114

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

19 Jan 2024 1,321

CLsurvey

Continual Hyperparameter Selection Framework. Compares 11 state-of-the-art Lifelong Learning meth...

06 Apr 2020 192

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...

16 Jun 2020 1,687

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...

02 Feb 2023 3,760