logoalt Hacker News

wamikslast Friday at 11:02 AM1 replyview on HN

Ok, let's chew on that. "reasonable mechanistic interpretability understanding" and "semantic" are carrying a lot of weight. I think nobody understands what's happening in these models -irrespective of narrative building from the pieces. On the macro level, everyone can see simple logical flaws.


Replies

jychanglast Friday at 11:12 AM

> I think nobody understands what's happening in these models

Quick question, do you know what "Mechanistic Interpretability Researcher" means? Because that would be a fairly bold statement if you were aware of that. Try skimming through this first: https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-ex...

> On the macro level, everyone can see simple logical flaws.

Your argument applies to humans as well. Or are you saying humans can't possibly understand bugs in code because they make simple logical flaws as well? Does that mean the existence of the Monty Hall Problem shows that humans cannot actually do math or logical reasoning?

show 2 replies