AIbase
Product LibraryTool Navigation

RLHF-CustomData

Public

Building an LLM with RLHF involves fine-tuning using human-labeled preferences. Based on Learning to Summarize from Human Feedback, it uses supervised learning, reward modeling, and PPO to improve response quality and alignment.

Creat2025-03-19T21:50:53
Update2025-03-24T21:42:11
1
Stars
0
Stars Increase