Logic-RL-Lite
PublicLightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
deepseekdeepseek-r1fine-tuninggpt-o1llmpost-trainingreasoning-language-modelsreasoning-modelsreinforcement-learning
Creat:2025-02-28T23:22:29
Update:2025-03-26T22:50:05
50
Stars
0
Stars Increase