All Models Found Angels

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

反馈

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

今日热点