logoalt Hacker News

bilsbietoday at 12:54 AM1 replyview on HN

How does this differ from golden gate Claude?


Replies

hijohnnylintoday at 2:06 AM

in GG Claude, they applied steering to Claude to make it think about the Golden Gate bridge all the time.

here, they don't modify/steer the base model. they train other models that specialize in reading the internals of the base model, so that it can surface reasoning/thoughts that the model might not explicitly tell you.

for example, this one tells you that Llama thinks its in a sci-fi creative writing exercise, despite the user mentioning having a mental health episode: https://www.neuronpedia.org/nla/cmonzq63g0003rlh8xi9onjnn

show 1 reply