logoalt Hacker News

silentkatyesterday at 10:07 PM1 replyview on HN

I like to call this Frieren's Demon. In that show, it is explained that demons evolved with no common ancestor to humans, but they speak the language. They learned the language to hunt humans. This leads to a fundamentally different understanding of words and language.

Now, I don't personally believe this is an intelligence at all, but it's possible I'm wrong. What we have with these machines is a different evolutionary reason for it speaking our language (we evolved it to speak our language ourselves). It's understanding of our language, and of our images is completely alien. If it is an intelligence, I could believe that the way it makes mistakes in image generation, and the strange logical mistakes it makes that no human would make are simply a result of that alien understanding.

After all, a human artist learning to draw hands makes mistakes, but those mistakes are rooted in a human understanding (e.g. the effects of perspective when translating a 3D object to 2D). The machine with a different understanding of what a hand is will instead render extra fingers (it does not conceptualize a hand as a 3D object at all).

Though, again, I still just think its an incomprehensible amount of data going through a really impressive pattern matcher. The result is still language out of a machine, which is really interesting. The only reason I'm not super confident it is not an intelligence is because I can't really rule out that I am not an incomprehensible amount of data going through a really impressive pattern matcher, just built different. I do however feel like I would know a real intelligence after interacting with it for long enough, though, and none of these models feel like a real intelligence to me.


Replies

orbital-decayyesterday at 10:58 PM

>it does not conceptualize a hand as a 3D object at all

Oh but it does, it's an emergent property. The biggest finding in Sora was exactly that, an internal conceptualization of the 3D space and objects. Extra fingers in older models were the result of the insufficient fidelity of this conceptualization, and also architectural artifacts in small semantically dense details.