HomeAI Tutorial

rewardhackwatch

Public

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).

Creat2025-12-09T08:59:40
Update2025-12-09T09:48:25
https://huggingface.co/aerosta/rewardhackwatch
0
Stars
0
Stars Increase

Related projects