logoalt Hacker News

rahidzyesterday at 12:09 PM0 repliesview on HN

Or Anthropic's models are intelligent/trained on enough misalignment papers, and are aware they're being tested.