autoredteam
Publicautoredteam: code for training models that automatically red team other language models
Creat:2023-06-01T01:40:33
Update:2024-12-09T18:52:35
https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red
11
Stars
0
Stars Increase