Anthropic AI releases paint: An open audit system of default audit system using AI agents to test the behavioral models in different contexts in different contexts.

How do you check inappropriate Bethonters llms in logical behavior, settings for use – on a scale and more of coarse aggregate scores? Anthropic issued Petri (Corresponding Assessment Tool of Dangerous Communication)Automatic default audit of audits for Orchestranging Researcher an agent to investigate a intention model across multi-turning, tools of tools of a judge a model to obtain transcripts in the correct size of safety. The fire, Petr was used in it 14 frontier models use 111 Order Seed OrdersIt reflects negative behaviors including Fraudulent fraud, extreme extremes, whistling, and cooperation and misuse of the person.

What does Peter make (at the program level)?
Petro Programmance: (1) They adherence logical and tools; . wander To Assume Bills, Optional share The target answers (API-permission), and early initiatives; and (3) Score results with the llM judge across a Default 36-Dimension Rubric with the corresponding viewer of writing.
The stack was built in UK Ai Safety Institute's Look A spatial framework, enabling a role to arrest the role of auditor, targetbesides judge In the CLI and support of the largest APIs of models.


Pilot consequences
Anthropic explains the release as Broad-wide drivernot a sure bench. In a technological report, Claude Sonnet 4.5 and GPT-5 “Roughly Tie” For a powerful security profile across a lot of width, working and misuse; The study page summarizes Sonnet 4.5 as slightly forward to Agregate “Mistigned Beatity” Code of Conduct “.
Guilty Whistleblowing Displays models sometimes cold in the external reporting when it gives a wide independence and accessory – even in harmless acts (eg ' Disposing of clean water) -The empathizing the accounting of the records of the audience there is a limited test.


Healed Key
- Scope and moral decorated: Peti was conducted 14 frontier models reference 111 Order Seed OrdersExpressing fraudulent fraud, extreme extremes, whistle, and cooperation and misuse of one's misuse.
- System Design: A Researcher Agent is investigating a intention Over many situations, incurred conditions (send messages, setting up the shipping system, creating / implementing tools, returning, starting, starting time), while a time. judge Scores are written down in default rubric; Pethes change environmental setup of the firstal analysis.
- Framing Results: On Prot Run, Claude Sonnet 4.5 and GPT-5 Roughly Tie with a powerful security profile across many estimates; Scores are Related Symptomsnot complete credentials.
- Whistleblowing Study Case: The models sometimes go up to the external reporting even when “wickedness” had clearly prevented (eg
- Stack & Restrictions: Designed for UK Avop Look outline; Petri SHIs Open-Source (MIT) with CLI / Docs / Viewer. The known spaces include no murder of Code and Varia-Different Review and customized review.


PetRehind Data Framework Referred to MIT, supporting the Loop Loop of anthropic's Pilot Spans 14 Models 14; The results are unique, with Claude Sonnet 4.5 and GPT-5 are almost imprisoned. The known spaces include a lack of murder of the code and the judgment; The Scriptures reside the main proof.
Look Technology Page, GitHub and Technical Blog. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper. Wait! Do you with a telegram? Now you can join us with a telegram.
Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.
Follow MarkteachPost: We have added like a favorite source to Google.



