Hey, this is something we're actively working on, but this is hard (and expensive) to do well a...

Bibabomas • today at 4:52 AM • 0 replies • view on HN

Hey, this is something we're actively working on, but this is hard (and expensive) to do well across harnesses/models. The grep pretraining thing is very interesting though, I've noticed the same. E.g. Sonnet 4.6 seems to trust semble but Opus 4.7 less so. I'm hoping we can quantitatively test this and improve it when we have proper benchmarks for this as well. If you do have any feedback though let me know!

alt Hacker News