logoalt Hacker News

zozbot234yesterday at 6:56 PM2 repliesview on HN

Anthropic has released open weight models for translating the activations of existing models, viz. Qwen 2.5 (7B), Gemma 3 (12B, 27B) and Llama 3.3 (70B) into natural language text. https://github.com/kitft/natural_language_autoencoders https://huggingface.co/collections/kitft/nla-models This is huge news and it's great to see Anthropic finally engage with the Hugging Face and open weights community!


Replies

jimmySixDOFtoday at 9:48 AM

Except Qwen already release their own fully baked interpretability SAE toolkit tuned on their models so deserve credit here and activation telescopes should be a standard part of every major release

[1] https://qwen.ai/blog?id=qwen-scope

rvzyesterday at 8:34 PM

We already know Anthropic does open source for a while such as the "flawed" MCP spec and "skills" spec.

This release is only done on other open-weight LLMs which have been released and even though they will use this research on their own closed Claude models, they will never release an open-weight Claude model even if it is for research purposes.

So this does not count, and it is specifically for the sake of this research only.

show 1 reply