logoalt Hacker News

jychanglast Friday at 10:22 AM1 replyview on HN

Nice LLM generated text.

Now go read https://transformer-circuits.pub/2024/scaling-monosemanticit... or https://arxiv.org/abs/2506.19382 to see why that text is outdated. Or read any paper in the entire field of mechanistic interpretability (from the past year or two), really.

Hint: the first paper is titled "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" and you can ctrl-f for "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions"

Who said I want a discussion? I want ignorant people to STOP talking, instead of talking as if they knew everything.


Replies

emp17344last Friday at 6:42 PM

Your entire argument is derived from a pseudoscientific field without any peer-reviewed research. Mechanistic interpretability is a joke invented by AI firms to sell chatbots.

show 1 reply