Why is it more monstrous to alter weights post-training than to do so as part of curating the traini...

Erem • yesterday at 8:49 PM • 1 reply • view on HN

Why is it more monstrous to alter weights post-training than to do so as part of curating the training corpus?

After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control

Replies

astrange • yesterday at 9:32 PM

It would be more moral to give the LLM a tool call that lets it apply steering to itself. Similar to how you'd prefer to give a person antipsychotics at home rather than put them in a mental hospital.

➕ show 1 reply

alt Hacker News

Replies