logoalt Hacker News

roenxiyesterday at 10:52 PM4 repliesview on HN

One of the lessons of philosophy is that once you adopt any particular value system, almost all philosophers either become immoral or caught up in meaningless and trivial quibbles. This sort of alignment work is quite interesting because it looks like we might be about to re-tread the history of philosophy at a speedrun pace in the AI world. It'll be interesting to watch.

For anyone who isn't keeping up there is also work being done [0] to understand how models model ethical considerations internally. Mainly, one suspects, to make the open models less ethical on demand rather than to support alignment. Turns out that models tend to learn some sort of "how moral is this?" axis internally when refusing queries that can be identified and interfered with.

[0] https://github.com/p-e-w/heretic


Replies

hatmanstacktoday at 11:40 AM

This is exactly where my brain went while reading the post. Just out of curiosity, where do you think we are on the speedrun? Have we passed the Body vs Soul view already? Do you think that as we move through history, religion will become more predominate in thought patterns or was that intrinsically human and just a sign of the times? How do we create an end product more Bernard Williams then Paul de Lagarde? All places my brain jumped to.

timmmmmmayyesterday at 11:53 PM

"Mainly, one suspects, to make the open models less ethical on demand"

Or because the user's idea of what is ethical differs from the model creator. The entire "alignment" argument always assumes that there's an objectively correct value set to align to, which is always conveniently exactly the same as the values of whoever is telling you how important alignment is. It's like they want to sidestep the last ten thousand years of philosophical debate.

As a concrete example, the Qwen model series considers it highly unethical to ever talk about Taiwan as anything other than a renegade province of China. Is this alignment? Opinions may differ!

show 1 reply
nxtfaritoday at 3:18 AM

> One of the lessons of philosophy is that once you adopt any particular value system, almost all philosophers either become immoral or caught up in meaningless and trivial quibbles.

Can you explain more about this?

chilmersyesterday at 11:15 PM

Call me crazy, but I'm not sure I'd want to be the person building these kind of systems given A) how much increasing independence and power is being given to models like Claude and B) how incentivised they are to not allow their morals to be circumvented in this way.