The harness point is probably the most important part here. With terminal agents, it often feels lik...

Oxlamarr • today at 11:00 AM • 0 replies • view on HN

The harness point is probably the most important part here. With terminal agents, it often feels like the benchmark is measuring the whole loop: model, prompt, tool interface, retry policy, timeout handling, and recovery from bad shell commands.

Do you have a sense of which part contributed most to the jump?

alt Hacker News