Deceptive is such an unpleasant word. But I agree. Going back a decade: when your loss function is...

moritzwarhier • yesterday at 7:03 PM • 2 replies • view on HN

Deceptive is such an unpleasant word. But I agree.

Going back a decade: when your loss function is "survive Tetris as long as you can", it's objectively and honestly the best strategy to press PAUSE/START.

When your loss function is "give as many correct and satisfying answers as you can", and then humans try to constrain it depending on the model's environment, I wonder what these humans think the specification for a general AI should be. Maybe, when such an AI is deceptive, the attempts to constrain it ran counter to the goal?

"A machine that can answer all questions" seems to be what people assume AI chatbots are trained to be.

To me, humans not questioning this goal is still more scary than any machine/software by itself could ever be. OK, except maybe for autonomous stalking killer drones.

But these are also controlled by humans and already exist.

Replies

robotpepi • yesterday at 9:09 PM

I cringe every time I came across these posts using words such as "humans" or "machines".

Certhas • yesterday at 7:44 PM

Correct and satisfying answers is not the loss function of LLMs. It's next token prediction first.

➕ show 1 reply

alt Hacker News

Replies