logoalt Hacker News

slopusilayesterday at 7:22 PM1 replyview on HN

yes https://www.anthropic.com/research/end-subset-conversations


Replies

benayesterday at 7:32 PM

This is going to sound nit-picky, but I wouldn't classify this as the model being able to say no.

They are trying to identify what they deem are "harmful" or "abusive" and not have their model respond to that. The model ultimately doesn't have the choice.

And it can't say no if it simply doesn't want to. Because it doesn't "want".

show 1 reply