logoalt Hacker News

zozbot234yesterday at 4:57 PM1 replyview on HN

Different kind of steering, that's just injecting text into the model's natural language thinking output or something very similar. You can do a middle ground though by using Anthropic's NLA work to look at the natural language rendition of a model's activations at a particular layer, edit the text and convert it back into completely different activations.


Replies

bel8yesterday at 5:03 PM

Ahh I see. Thanks for the clarification.