All Models Found Angels

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

当前正在显示可能无法访问的结果。

隐藏无法访问的结果

反馈

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

今日热点