Nano-R1
PublicThis project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
Discover Popular AI-MCP Services - Find Your Perfect Match Instantly
Easy MCP Client Integration - Access Powerful AI Capabilities
Master MCP Usage - From Beginner to Expert
Top MCP Service Performance Rankings - Find Your Best Choice
Publish & Promote Your MCP Services
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.