The eating disorder section is kind of crazy. Are we going to incrementally add sections for every &...

walthamstow • yesterday at 1:08 PM • 14 replies • view on HN

The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?

Replies

embedding-shape • yesterday at 1:18 PM

Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".

➕ show 1 reply

zozbot234 • yesterday at 10:02 PM

That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.

➕ show 2 replies

jeffrwells • yesterday at 9:07 PM

Another way to think about it: every single user of Claude is paying an extra tax in every single request

➕ show 2 replies

WarmWash • yesterday at 1:54 PM

When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

ikari_pl • yesterday at 10:20 PM

Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

seba_dos1 • yesterday at 9:59 PM

It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.

rzmmm • yesterday at 2:46 PM

The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

mohamedkoubaa • yesterday at 10:44 PM

Starting to feel like a "we were promised flying cars but all we got" kind of moment

l5870uoo9y • yesterday at 9:50 PM

Could be that Claude has particular controversial opinions on eating disorders.

➕ show 2 replies

newZWhoDis • yesterday at 10:15 PM

>the year is 2028 >5M of your 10M context window is the system prompt

ls612 • yesterday at 9:23 PM

Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.

felixgallo • yesterday at 1:18 PM

I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?

➕ show 1 reply

idiotsecant • yesterday at 1:40 PM

Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

alt Hacker News

Replies