HomeAI Tutorial

Online-RLHF

Public

A recipe for online RLHF and online iterative DPO.

Creat2024-05-10T22:33:50
Update2025-03-25T21:19:21
https://rlhflow.github.io/
537
Stars
0
Stars Increase