DeepEnlighten
PublicPure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
deepseekdeepseek-r1fine-tuninggpt-o1llmpost-trainingreasoning-language-modelsreasoning-modelsreinforcement-learning
Creat:2025-03-12T21:18:28
Update:2025-03-27T03:36:34
38
Stars
0
Stars Increase