logoalt Hacker News

mrbungielast Friday at 12:20 PM2 repliesview on HN

Was this "paper" eventually peer reviewed?

PS: I know it is interesting and I don't doubt Antrophic, but for me it is so fascinating they get such a pass in science.


Replies

ACCount37last Friday at 1:18 PM

Modern ML is old school mad science.

The lifeblood of the field is proof-of-concept pre-prints built on top of other proof-of-concept pre-prints.

show 1 reply
jychangyesterday at 1:55 AM

This is more of an article describing their methodology than a full paper. But yes, there's plenty of peer reviewed papers on this topic, scaling sparse autoencoders to produce interpretable features for large models.

There's a ton of peer reviewed papers on SAEs in the past 2 years; some of them are presented at conferences.

For example: "Sparse Autoencoders Find Highly Interpretable Features in Language Models" https://proceedings.iclr.cc/paper_files/paper/2024/file/1fa1...

"Scaling and evaluating sparse autoencoders" https://iclr.cc/virtual/2025/poster/28040

"Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning" https://proceedings.neurips.cc/paper_files/paper/2024/hash/c...

"Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2" https://aclanthology.org/2024.blackboxnlp-1.19.pdf