logoalt Hacker News

walthamstowyesterday at 1:08 PM14 repliesview on HN

The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?


Replies

embedding-shapeyesterday at 1:18 PM

Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".

show 1 reply
zozbot234yesterday at 10:02 PM

That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.

show 2 replies
jeffrwellsyesterday at 9:07 PM

Another way to think about it: every single user of Claude is paying an extra tax in every single request

show 2 replies
WarmWashyesterday at 1:54 PM

When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

ikari_plyesterday at 10:20 PM

Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

seba_dos1yesterday at 9:59 PM

It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.

rzmmmyesterday at 2:46 PM

The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

mohamedkoubaayesterday at 10:44 PM

Starting to feel like a "we were promised flying cars but all we got" kind of moment

l5870uoo9yyesterday at 9:50 PM

Could be that Claude has particular controversial opinions on eating disorders.

show 2 replies
newZWhoDisyesterday at 10:15 PM

>the year is 2028 >5M of your 10M context window is the system prompt

ls612yesterday at 9:23 PM

Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.

felixgalloyesterday at 1:18 PM

I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?

show 1 reply
idiotsecantyesterday at 1:40 PM

Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.