Guide-GRPO
PublicAims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
Discover Popular AI-MCP Services - Find Your Perfect Match Instantly
Easy MCP Client Integration - Access Powerful AI Capabilities
Master MCP Usage - From Beginner to Expert
Top MCP Service Performance Rankings - Find Your Best Choice
Publish & Promote Your MCP Services
Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.