logoalt Hacker News

svieirayesterday at 7:42 PM1 replyview on HN

Anthropic was talking about this as a "oh nifty, look at this" back in 2024: https://www.anthropic.com/news/golden-gate-claude

The fact that steering one of these things is trivial nowadays and the vectors are close-to-free-to-store (since you don't need anything large to influence the space, see also https://www.youtube.com/watch?v=ahtbcExEKng) means that this is very likely already happening.


Replies

kridsdale1yesterday at 8:27 PM

“AI safety alignment” implies political bias injection from the very start. “We have to ensure models output text that is in line with the median politics of the San Francisco Board of Supervisors”, etc.

Not a stretch to go from there to “Of course the model should recommend Mountain Dew. It’s got electrolytes!”