logoalt Hacker News

rbanffyyesterday at 9:09 PM1 replyview on HN

The process would need to have some knowledge of the desired outcome, much like a human expert would have a hunch of the design decisions to make.


Replies

Scene_Cast2yesterday at 9:48 PM

Not necessarily. I'm a proponent of (admittedly not very popular) methodology of "train, do interpretability analysis, adjust model architecture".

It's not more popular for a few reasons: 1) you first need to train a full general model anyhow 2) interpretability is nontrivial and not guaranteed 3) once you make the architectural changes, you can't commit to that architecture as you might miss out in the future with more advancements 4) with modern transformers, there is limited amount of architectural "play" happening.