Reinforcement-Fine-Tuning-LLMs-with-GRPO

Public

The course teaches how to fine-tune LLMs using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves model reasoning with minimal data. Learn RFT concepts, reward design, LLM-as-a-judge evaluation, and deploy jobs on the Predibase platform.

ai-evaluation ai-optimization ai-training deeplearning-ai-courses grpo language-model llm-as-judge llm-development llm-fine-tuning machine-learning-algorithms

Creat：2025-05-31T02:19:49

Update：2025-06-11T19:44:23

https://www.deeplearning.ai/short-courses/reinforcement-fine-tuning-llms-grpo/

Stars

Stars Increase

Related projects

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

179572

9个月前

+32today

Stable Diffusion Webui

Hot

Stable Diffusion web UI

158047

1年前

+148today

N8n

Hot

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

155104

4年前

+528today

Langchain

Hot

Elixir implementation of a LangChain style framework that lets Elixir projects integrate with and leverage LLMs.

119285

1年前

+183today

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

118483

7个月前

+188today