If I set a regular expression as watcher on a filesytem to notify me if any file changes and I write that in go and assuming regular expression isn't buggy nor its implementation - and then I write rules in a file (as regex) then there's snowball in hell of a chance that it would misnotify or miscategorize anything.
Are LLMs that super reliable in their output already with all the guardrails around?
Don't think so. Hence it is snake oil just like dozens of harnesses.
It might behave differently than specified and a human is required to validate every output carefully or else.
> Are LLMs that super reliable in their output already with all the guardrails around?
Well, what is your definition of "super reliable in the output", and is it a quantifiable/measurable target or just a feeling?
Is it "more than humans", "more than senior developers", "almost perfect", "perfect"?
> It might behave differently than specified and a human is required to validate every output carefully or else.
Sure, just like meatbag developers. All the security flaws AI finds today were introduced years/decades ago by humans and haven't been found (that we know) by humans in ages.