logoalt Hacker News

yorwbatoday at 10:28 AM0 repliesview on HN

The verbalizer and reconstruction models are both initially finetuned on LLM output from a summarization prompt. The resulting text is not completely unrelated, but mostly wrong: https://transformer-circuits.pub/2026/nla/png/img_18fcfc16e9... The reconstructed activations are also far from matching the verbalizer's input. It's not unusual in machine learning to have results that are shit and SOTA at the same time, simply because there's no other technique that works better.