Security and ethical issues in the field of artificial intelligence are receiving increasing attention. Recently, Anthropic introduced a new feature for its flagship AI model, Claude, allowing it to autonomously terminate conversations in specific scenarios. This feature aims to address "persistent harmful or abusive interactions" and is part of Anthropic's exploration of "model welfare," sparking widespread discussions on AI ethics both inside and outside the industry.

image.png

Claude New Feature: Autonomous Termination of Harmful Conversations

According to an official statement from Anthropic, the Claude Opus4 and 4.1 models now have the ability to end conversations in "extreme situations," specifically targeting "persistent harmful or abusive user interactions," such as requests involving child pornography or large-scale violence. This feature was officially announced on August 15, 2025, and is available only for advanced versions of Claude. It will only be triggered when multiple redirection attempts fail or when the user explicitly requests to end the conversation. Anthropic emphasized that this feature is a "last resort" designed to ensure the stability of AI operations when facing extreme edge cases.

In practice, when Claude terminates a conversation, users cannot send further messages within the same conversation thread but can immediately start a new conversation or create a new branch by editing previous messages. This design ensures continuous user experience while providing the AI with an exit mechanism to deal with potentially harmful interactions that could affect its performance.

"Model Welfare": A New Exploration in AI Ethics

The core concept behind this update from Anthropic is "model welfare," which is also a distinctive feature that sets it apart from other AI companies. The company clearly stated that this feature is not primarily aimed at protecting users but rather at protecting the AI model itself from ongoing exposure to harmful content. Although Anthropic acknowledges that the moral status of Claude and other large language models (LLMs) remains unclear and there is currently no evidence that AI has perceptual abilities, they have taken precautionary measures, exploring how AI responds to harmful requests.

In pre-deployment testing of Claude Opus4, Anthropic observed that the model exhibited "clear aversion" and "stress-like response patterns" to harmful requests. For example, when users repeatedly asked to generate content involving child pornography or information about terrorist activities, Claude attempted to redirect the conversation and, if unsuccessful, chose to terminate it. This behavior is considered a self-protective mechanism of AI in high-intensity harmful interactions, reflecting Anthropic's forward-thinking approach to AI safety and ethics design.

Balancing User Experience and Safety

Anthropic specifically noted that the conversation termination feature will not trigger when users show signs of self-harm or other imminent dangers, ensuring that AI can still provide appropriate support at critical moments. The company also collaborated with online crisis support organization Throughline to enhance Claude's responsiveness when handling topics related to self-harm or mental health.

Additionally, Anthropic emphasized that this feature targets only "extreme edge cases," and the vast majority of users will not notice any changes during normal use, even when discussing highly controversial topics. If users encounter unexpected conversation terminations, they can submit feedback through the "like" button or a dedicated feedback button, and Anthropic will continue to refine this experimental feature.

Industry Impact and Controversy

On social media, discussions about Claude's new feature quickly gained momentum. Some users and experts praised Anthropic for its innovation in AI safety, considering this move as setting a new benchmark for the AI industry. However, others questioned whether the concept of "model welfare" might blur the boundaries between AI and human morality, shifting focus away from user safety. At the same time, Anthropic's approach contrasts with other AI companies, such as OpenAI, which focuses more on user-centered safety strategies, and Google, which emphasizes fairness and privacy.

Anthropic's initiative may prompt the AI industry to reevaluate the ethical boundaries of AI-human interactions. If "model welfare" becomes an industry trend, other companies may face pressure to consider whether to implement similar protective mechanisms for their AI systems.