Why is it more monstrous to alter weights post-training than to do so as part of curating the training corpus?
After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control
It would be more moral to give the LLM a tool call that lets it apply steering to itself. Similar to how you'd prefer to give a person antipsychotics at home rather than put them in a mental hospital.