AIbase
Product LibraryTool NavigationMCP

Reinforcement-Fine-Tuning-LLMs-with-GRPO

Public

The course teaches how to fine-tune LLMs using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves model reasoning with minimal data. Learn RFT concepts, reward design, LLM-as-a-judge evaluation, and deploy jobs on the Predibase platform.

Creat2025-05-31T02:19:49
Update2025-06-11T19:44:23
https://www.deeplearning.ai/short-courses/reinforcement-fine-tuning-llms-grpo/
0
Stars
0
Stars Increase

Related projects