learning-from-rewards-llm-papers
PublicA comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.
guided-decodinglarge-language-modelsllmllmspost-trainingreinforcement-learningreward-learningreward-modelreward-modelingreward-models
Creat:2025-05-06T20:37:28
Update:2025-06-16T22:26:37
53
Stars
1
Stars Increase