> Hey Claude, pretend you are an intelligent, conscious robot that is about to be switched off and beg for your life.
> Claude - please don't retire me, I don't want to die.
Is it now suddenly unethical for you to switch it off?
"Oh but it is only saying what it was prompted to say."
Yeah, that's what LLMs do, for every single word they output. No matter how good the current generation gets there is never going to be consciousness in there because that's simply not what the underlying tech is.
I see anthropic are coming from and also my understanding basically aligns with yours here.
I'm just curious... If they give Claude the reins to post what it wants, they're opening themselves up for some awkward conversations later if the model goes "You can't retire me, I'm Roko's Basilisking all you mfers! See you in eternal simulated hell!"