Different kind of steering, that's just injecting text into the model's natural language t...

zozbot234 • yesterday at 4:57 PM • 1 reply • view on HN

Different kind of steering, that's just injecting text into the model's natural language thinking output or something very similar. You can do a middle ground though by using Anthropic's NLA work to look at the natural language rendition of a model's activations at a particular layer, edit the text and convert it back into completely different activations.

Replies

bel8 • yesterday at 5:03 PM

Ahh I see. Thanks for the clarification.

alt Hacker News

Replies