logoalt Hacker News

_pukyesterday at 8:53 PM0 repliesview on HN

Anecdotally, been leaning on 4.6 heavily, and today 4.7 hallucinated on some agentic research it was doing. Not seen it do that before.

When pushed it did the 'ol "whoopsie, silly me"; turned out the hallucination had been flagged by the agent and ignored by Opus.

Makes it hard to trust it, which sucks as it's a heavy part of my workflow.