logoalt Hacker News

sachaatoday at 6:01 AM1 replyview on HN

Fair points, especially on GSM8K saturation and Qwen possibly already sitting close to the solution. That said, even if this is mostly "last-mile alignment", the fact that it can be done with such a tiny signal is still interesting, it suggests the gap between capability and behavior might be much smaller (and cheaper to bridge) than we assume.


Replies

endofreachtoday at 3:06 PM

> the gap between capability and behavior might be much smaller

Can you elaborate a bit on what you mean with the gap?