AIbase

multiturn-20q

Public

Multiturn RLHF applied to the 20 questions game through proxy rewards to improve LLM accuracy in sparse reward space

Creat2025-04-05T05:24:45
Update2025-06-10T00:36:59
0
Stars
0
Stars Increase