logoalt Hacker News

glaslongtoday at 4:42 AM0 repliesview on HN

Principal-SWE-Bench will take some time to run, because the LLM needs to wait for a crisis to present its solution, having correctly identified that the same solution would have been organizationally impossible to propose until that moment.