AIbase

quantile-reward-policy-optimization

Public

Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok et al. 2025).

Creat2025-07-14T22:48:30
Update2025-07-15T05:49:26
https://claire-labo.github.io/quantile-reward-policy-optimization/
24
Stars
0
Stars Increase

Related projects