Ok, I'll bite. Let's assume a modern cutting edge model but even with fairly standard GQA attention, and something obviously bigger than just monosemantic features per neuron.
Based on any reasonable mechanistic interpretability understanding of this model, what's preventing a circuit/feature with polysemanticity from representing a specific error in your code?
---
Do you actually understand ML? Or are you just parroting things you don't quite understand?
Ok, I'll bite. Let's assume a modern cutting edge model but even with fairly standard GQA attention, and something obviously bigger than just monosemantic features per neuron.
Based on any reasonable mechanistic interpretability understanding of this model, what's preventing a circuit/feature with polysemanticity from representing a specific error in your code?
---
Do you actually understand ML? Or are you just parroting things you don't quite understand?