HomeAI Tutorial

fine-tuning-and-reinforcement-learning-on-llms

Public

supervised fine tuning and RLAIF on DeepSeek-math-7b-base using LoRA adapters and GRPO training objective

Hora de criação2025-11-06T09:01:08
Hora de atualização2025-11-06T15:32:06
1
Stars
0
Stars Increase

Projetos relacionados