logoalt Hacker News

tiagobrawyesterday at 9:38 PM1 replyview on HN

Interesting. Claude Opus 4.8 and Gemini 3.1 Lite kind of got it right, but when I ask the model directly, they say they don't know. I'm curious how the tool is doing the correlation.


Replies

turtlesoupyesterday at 10:23 PM

Prompt for rollouts posted below (https://news.ycombinator.com/item?id=48592415). I have a bit more information on the clustering part in https://intheweights.com/about but every thing returned by the model is viewable (possibly under the "hallucinations" section)