HomeAI Tutorial

fine-tuning-and-reinforcement-learning-on-llms

Public

supervised fine tuning and RLAIF on DeepSeek-math-7b-base using LoRA adapters and GRPO training objective

Creat2025-11-06T09:01:08
Update2025-11-06T15:32:06
1
Stars
0
Stars Increase

Related projects