logoalt Hacker News

saezbaldoyesterday at 8:59 PM0 repliesview on HN

The cascading failure point is critical. A 1% miss rate per layer in a 5-layer pipeline gives you roughly 5% end-to-end failure, and that's assuming independence. In practice the failures correlate because multilingual edge cases that bypass one guardrail tend to bypass adjacent ones too.

The observation that guardrails need to move from static policy filters to composable decision layers is exactly right. But I'd push further: the layer that matters most isn't the one checking outputs. It's the one checking authority before the action happens.

A policy filter that misses a Persian prompt injection still blocks the action if the agent doesn't hold a valid authorization token for that scope. The authorization check doesn't need to understand the content at all. It just needs to verify: does this agent have a cryptographically valid, non-exhausted capability token for this specific action?

That separates the content safety problem (hard, language-dependent, probabilistic) from the authority control problem (solvable with crypto, language-independent, deterministic). You still need both, but the structural layer catches what the probabilistic layer misses.