logoalt Hacker News

Liongalast Friday at 10:04 AM1 replyview on HN

[flagged]


Replies

jychanglast Friday at 10:16 AM

Ok, I'll bite. Let's assume a modern cutting edge model but even with fairly standard GQA attention, and something obviously bigger than just monosemantic features per neuron.

Based on any reasonable mechanistic interpretability understanding of this model, what's preventing a circuit/feature with polysemanticity from representing a specific error in your code?

---

Do you actually understand ML? Or are you just parroting things you don't quite understand?

show 2 replies