logoalt Hacker News

somatlast Wednesday at 3:06 AM1 replyview on HN

"AMD’s AI director reports that Claude Code has become “dumber and lazier” since February, based on analysis of 6,852 sessions and 234,760 tool calls, which is the most thorough performance review any AI has received and rather more than most human employees get."

Are there any good ways to measure agent ability? Or do we just have to go by vibes?


Replies

bigfishrunninglast Wednesday at 1:03 PM

The whole AI corner of the computer industry is based entirely on vibes, why stop now?