logoalt Hacker News

baqtoday at 10:21 AM0 repliesview on HN

Do you know of evals with default Claude vs caveman Claude vs politician Claude solving the same tasks? Hypothesis is plausible, but I wouldn’t take it for granted