reinforcement-learning-human-feedback-scratch

Public

End-to-end implementation of Reinforcement Learning with Human Feedback (RLHF) to align a GPT-2 model with human preferences — covering Supervised Fine-Tuning (SFT), Reward Modeling, and PPO-based alignment — built from scratch in Python.

ai-alignment deep-learning gpt2 huggingface-transformers large-language-models lora nlp peft ppo pytorch

Creat：2025-09-12T22:06:18

Update：2025-09-24T02:17:00

Stars

Stars Increase

Related projects

Tensorflow

deep-learning

An Open Source Machine Learning Framework for Everyone

192042

2年前

+21today

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

179020

8个月前

+23today

Stable Diffusion Webui

Stable Diffusion web UI

157260

1年前

+27today

Transformers

Hot

bert

? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

151105

2年前

+55today

N8n

Hot

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

148752

4年前

+331today

30 Seconds Of Code

astro

Coding articles to level up your development skills

125503

6个月前

+9today

Langchain

Hot

Elixir implementation of a LangChain style framework that lets Elixir projects integrate with and leverage LLMs.

117244

1年前

+65today

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

116448

6个月前

+78today

Open Webui

Hot

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

112425

8个月前

+90today

Generative Ai For Beginners

21 Lessons, Get Started Building with Generative AI ? https://microsoft.github.io/generative-ai-for-beginners/

100185

8个月前

+33today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

reinforcement-learning-human-feedback-scratch

Related projects

Tensorflow

AutoGPT

Stable Diffusion Webui

Transformers

N8n

30 Seconds Of Code

Langchain

Dify

Open Webui

Generative Ai For Beginners

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

reinforcement-learning-human-feedback-scratch

Related projects

Tensorflow

AutoGPT

Stable Diffusion Webui

Transformers

N8n

30 Seconds Of Code

Langchain

Dify

Open Webui

Generative Ai For Beginners

GEO Services