That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of toke...

threepts • yesterday at 2:27 PM • 2 replies • view on HN

That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of tokens outperform 10k dollars of labor in meta design.

Replies

SpicyLemonZest • yesterday at 2:38 PM

That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix.

➕ show 1 reply

dawnerd • yesterday at 4:23 PM

1000 dollars of subsidized tokens.

alt Hacker News

Replies