Anthropic Launches Audit Agent to Aid in AI Model Alignment Testing
Anthropic introduces AI audit agents (investigation, evaluation, red-teaming) to enhance model alignment testing. Agents enable parallel audits, detect biases with 42% success rate, addressing manual audit limitations. Code open-sourced on GitHub.....