safe-rlhf
PublicSafe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Heure de création:2023-05-15T19:47:08
Heure de mise à jour:2025-03-27T11:11:18
https://pku-beaver.github.io
1.5K
Stars
0
Stars Increase
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback