logoalt Hacker News

drakythetoday at 3:34 PM0 repliesview on HN

They can't respect boundaries as long as those boundaries exist only in the LLM instruction set. A human being who follows rules long enough the rules will become second nature (usually), almost to the point where long running companies are known for having rules no one understands (Chesterton's Fence is alive and well).

But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.