This is a bit rude. We didn't generate this project, we wrote it, a lot of it manually, and t...

stephantul • today at 4:19 AM • 1 reply • view on HN

This is a bit rude.

We didn't generate this project, we wrote it, a lot of it manually, and trained custom models. We'd been working in the real-time retrieval space for a while, and we thought coding was a good fit for this specific technology.

Replies

esperent • today at 5:00 AM

My comment above wasn't meant to be rude. And you do have extensive benchmarks against grep etc so it's clear you understand the importance of that.

But I still think you're missing the harder but more important proof which is agent evals. Have you done any of that?

I would personally love to find tools in this space which can make agents more efficient and I do believe there's a scope for massive improvements compared to default workflows. But my evals with RTK and Headroom have made me wary that a tool can look like it should work, conceptually make sense, pass non-agentic benchmarks, and still make an actual agentic workflow worse.

➕ show 1 reply

alt Hacker News

Replies