logoalt Hacker News

telivity-realtoday at 8:12 PM1 replyview on HN

The agent-agnostic approach is interesting, but I think the bigger architectural question is what happens when you move beyond code generation into domains where the agent's output has real-world consequences — booking a flight, executing a trade, dispensing medical advice.

For code, the worst case is a bad PR that gets caught in review. For domain-specific agents handling real transactions, you need a fundamentally different trust model. The LLM can't be making the decisions — it needs to be constrained to intent parsing while deterministic logic handles execution. Sandboxing the runtime (what you're doing) is necessary but not sufficient. You also need to sandbox the decision space.

Curious whether you've seen demand for non-SWE agent workloads, or if the "prompt to PR" pattern is where most of the traction is right now.


Replies

danoandcotoday at 8:36 PM

We’re focused on SWE use cases. Code is nice because there’s already a built-in verification loop: diffs, tests, CI, review, rollback. But you do quickly get to a state where the agent needs to make a risky action (db migration, or an infra operation). And this is where the permissions features from the agents are handy: allowlist, automode, etc. So you have approve/reject only the high risk actions. And I think this risk model is valid for both technical and non-technical use cases