Gadgets

Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations

August 17, 2025

343

[ad_1]

Anthropic has introduced new capabilities that may enable a few of its latest, largest fashions to finish conversations in what the corporate describes as “uncommon, excessive instances of persistently dangerous or abusive consumer interactions.” Strikingly, Anthropic says it’s doing this to not defend the human consumer, however fairly the AI mannequin itself.

To be clear, the corporate isn’t claiming that its Claude AI fashions are sentient or could be harmed by their conversations with customers. In its personal phrases, Anthropic stays “extremely unsure in regards to the potential ethical standing of Claude and different LLMs, now or sooner or later.”

Nonetheless, its announcement factors to a latest program created to review what it calls “mannequin welfare” and says Anthropic is basically taking a just-in-case strategy, “working to determine and implement low-cost interventions to mitigate dangers to mannequin welfare, in case such welfare is feasible.”

This newest change is presently restricted to Claude Opus 4 and 4.1. And once more, it’s solely alleged to occur in “excessive edge instances,” reminiscent of “requests from customers for sexual content material involving minors and makes an attempt to solicit data that may allow large-scale violence or acts of terror.”

Whereas these varieties of requests might probably create authorized or publicity issues for Anthropic itself (witness latest reporting round how ChatGPT can probably reinforce or contribute to its customers’ delusional pondering), the corporate says that in pre-deployment testing, Claude Opus 4 confirmed a “robust desire in opposition to” responding to those requests and a “sample of obvious misery” when it did so.

As for these new conversation-ending capabilities, the corporate says, “In all instances, Claude is simply to make use of its conversation-ending capacity as a final resort when a number of makes an attempt at redirection have failed and hope of a productive interplay has been exhausted, or when a consumer explicitly asks Claude to finish a chat.”

Anthropic additionally says Claude has been “directed to not use this capacity in instances the place customers is perhaps at imminent threat of harming themselves or others.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

When Claude does finish a dialog, Anthropic says customers will nonetheless be capable to begin new conversations from the identical account, and to create new branches of the troublesome dialog by modifying their responses.

“We’re treating this characteristic as an ongoing experiment and can proceed refining our strategy,” the corporate says.

[ad_2]

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations

ABOUT US

POPULAR POSTS

ZOTAC RTX 5090 value axed by 17%, dropping to lowest-ever on Amazon

Bearish and Bullish Targets Revealed

Sophos Endpoint is now built-in with Taegis MDR and XDR – Sophos Information

POPULAR CATEGORY