reinforcement-learning-human-feedback-scratch
PublicEnd-to-end implementation of Reinforcement Learning with Human Feedback (RLHF) to align a GPT-2 model with human preferences — covering Supervised Fine-Tuning (SFT), Reward Modeling, and PPO-based alignment — built from scratch in Python.