Nano-R1
PublicThis project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
Creat:2025-04-04T14:00:58
Update:2025-04-04T15:28:25
https://huggingface.co/Akshint47/Nano_R1_Model
3
Stars
0
Stars Increase