> I started looking at the commits, and it's basically solving the ,,tests not pass'&#x...

tarruda • today at 9:53 AM • 1 reply • view on HN

> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves

Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.

GPT/Codex is more honest in this regard.

Replies

InsideOutSanta • today at 10:20 AM

Yeah, Claude is very creative in finding ways of "solving" problems that go against what the user probably intended.

Having said that, after looking at some of the test changes, they seem to be minor things, like changing timeouts, not changing the actual intended semantics of the tests. But it's too much code to review everything, so I might be completely wrong about that, and in real-world usage, even minor changes like these will cause issues.

alt Hacker News

Replies