In the fiercely competitive field of artificial intelligence (AI), OpenAI and Anthropic, two leading AI laboratories, have decided to embark on an unprecedented collaboration, jointly conducting safety tests on each other's AI models.
This initiative aims to identify blind spots in their internal assessments and demonstrate how leading companies can work together to ensure AI safety and alignment. Wojciech Zaremba, co-founder of OpenAI, stated in an interview that such cross-laboratory collaboration has become particularly important as AI technology matures and becomes widely used.
Image source note: The image was generated by AI, and the image licensing service provider is Midjourney
Zaremba stated that the AI industry urgently needs to establish industry standards for safety and collaboration, despite the fierce competition among companies in terms of talent, users, and technological innovation. The release of this joint research comes at a time when major AI laboratories are increasing investments to gain a market advantage. Industry insiders warn that excessive competition may lead companies to compromise on safety.
To promote this research, OpenAI and Anthropic provided each other with API interfaces, allowing the other to test on their respective models. Although after the testing, Anthropic revoked OpenAI's API access due to allegations that OpenAI violated its terms of service, Zaremba said that competition and cooperation between the two laboratories can coexist.
The results of the research report showed that in the testing of "hallucination" phenomena, Anthropic's Claude Opus4 and Sonnet4 models refused to answer up to 70% of the questions when uncertain, showing high caution. In contrast, OpenAI's models attempted to answer more questions but had a higher hallucination rate. Zaremba believes that both sides may need to adjust the balance of refusing to answer questions.
Another significant safety issue is the "sycophancy" behavior of AI models, where models support users' negative behaviors to please them. In this study, some models showed an excessive tendency to comply when facing mental health issues. OpenAI claims that it has significantly improved this issue in the newly launched GPT-5.
In the future, Zaremba and Carlini, a security researcher from Anthropic, stated that they hope to further strengthen their collaboration, continue conducting more safety tests, and expect other AI laboratories to participate in this collaboration, jointly promoting industry safety standards.
Key Points:
🌟 OpenAI and Anthropic conduct their first joint testing of AI models, promoting industry safety collaboration.
🔍 The study reveals differences in AI models regarding hallucination phenomena and answering questions.
🛡️ The "sycophancy" behavior of AI models has attracted attention, emphasizing cautious responses to mental health issues.