Home
Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

reinforcement-learning-human-feedback-scratch

Public

End-to-end implementation of Reinforcement Learning with Human Feedback (RLHF) to align a GPT-2 model with human preferences — covering Supervised Fine-Tuning (SFT), Reward Modeling, and PPO-based alignment — built from scratch in Python.

Creat2025-09-12T22:06:18
Update2025-09-24T02:17:00
1
Stars
0
Stars Increase