Anthropic has released open weight models for translating the activations of existing models, viz. Qwen 2.5 (7B), Gemma 3 (12B, 27B) and Llama 3.3 (70B) into natural language text. https://github.com/kitft/natural_language_autoencoders https://huggingface.co/collections/kitft/nla-models This is huge news and it's great to see Anthropic finally engage with the Hugging Face and open weights community!
We already know Anthropic does open source for a while such as the "flawed" MCP spec and "skills" spec.
This release is only done on other open-weight LLMs which have been released and even though they will use this research on their own closed Claude models, they will never release an open-weight Claude model even if it is for research purposes.
So this does not count, and it is specifically for the sake of this research only.
Except Qwen already release their own fully baked interpretability SAE toolkit tuned on their models so deserve credit here and activation telescopes should be a standard part of every major release
[1] https://qwen.ai/blog?id=qwen-scope