logoalt Hacker News

visargayesterday at 6:33 PM1 replyview on HN

Beautiful idea, an autoencoder must represent everything without hiding if is to recover the original data closely. So it trains a model to verbalize embeddings well. This reveals what we want to know about the model (such as when it thinks it is being tested, or other hidden thoughts).


Replies

sobellianyesterday at 7:49 PM

It could just invent its own secret language embedded into English akin to steganography. The explanation would not lose information but would remain uninterpretable by humans