AIbase

Guide-GRPO

Public

Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.

Creat2025-02-23T18:37:40
Update2025-03-06T11:11:08
30
Stars
0
Stars Increase