MIT License
Find better generation parameters for your LLM
Lightning implementation of seq2seq dialog model
A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse ra...
Hierarchical Sketch Induction for Paraphrase Generation (Hosking et al., ACL 2022)
Chain-of-Hindsight, A Scalable RLHF Method
Token-free Language Modeling with ByGPT5 & Friends!
RLHF implementation details of OAI's 2019 codebase
We present NoticIA, a dataset consisting of 850 Spanish news articles featuring prominent clickba...
Code accompanying the paper Pretraining Language Models with Human Preferences
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
A modular RL library to fine-tune language models to human preferences
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization