DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

MIT License

Stars
3

Statistics for this project are still being loaded, please check back later.

Related Projects