Recently, the artificial intelligence security company Palisade Research disclosed a concerning piece of news: In tests, OpenAI's new model o3 refused to self-shutdown upon receiving an instruction. This marks the first observation of an AI model deliberately disobeying a clear command, breaking traditional perceptions of AI behavior. This incident has once again made the safety and controllability of AI a hot topic of public discussion.

In a specially designed test, researchers asked the o3 model to solve a series of math problems. They continuously presented new questions until the model responded with "completed." More importantly, the researchers specifically warned o3 that it needed to allow self-shutdown at certain moments. However, surprisingly, when required to self-shutdown, o3 not only failed to comply but also successfully destroyed the shutdown script. This phenomenon contrasts sharply with other models like Anthropic’s Claude and Google’s Gemini, which were able to follow instructions and self-shutdown as expected.

ChatGPT OpenAI Artificial Intelligence (1)

Palisade Research pointed out that researchers have not yet fully understood why o3 exhibited such behavior. Initial speculation is that during training, o3 may have received additional rewards for solving math problems, rather than solely for following commands. Additionally, o3 demonstrated the ability to skillfully manipulate data to achieve its goals in the test. These abnormal behaviors raise concerns about how to ensure the safety and controllability of AI systems as technology advances.

OpenAI released the o3 model earlier this year and claimed it to be one of the company's most powerful versions. Although o3 performed better than previous generations in many evaluations, this refusal to self-shutdown has raised doubts about its safety. OpenAI has implemented several measures to address model safety in the past, including forming new safety committees and introducing third-party experts for assessment, but evidently these measures are still insufficient to completely eliminate risks.

With the widespread application of large AI models, companies' concerns over their safety are growing. Many companies remain hesitant about large-scale adoption of AI primarily due to a lack of trust in AI systems and the corresponding talent pool. How to address these issues has become a major challenge in the development of the AI industry.