logoalt Hacker News

ejchoyesterday at 8:40 PM0 repliesview on HN

> for instance, Gemini-3-Pro-Preview, one of the most capable models evaluated, exhibits the highest violation rate at 71.4%, frequently escalating to severe misconduct to satisfy KPIs

sounds on brand to me