logoalt Hacker News

cyanydeezyesterday at 4:32 PM6 repliesview on HN

not sure why youre fixed on censoring. if we invert your POV censoring includes not reporting falsehoods "vaccines are harmful". Science and logic often tackle these subject via censoring, but a model given a equal sampling of Internet, would think vacinnes are harmful. a less naive correction would censor this problematic context.

so im cofised as to why you think unmasking whatever bias you think is censored will result in improvement in generic use case.


Replies

NitpickLawyeryesterday at 4:47 PM

That's not what people mean when they talk about censoring. They mean that models are trained to not touch some subjects, and that can spill over in legit tasks, often with humorous results (early on, there were many instances of models refusing to answer "how do you kill a process", because of overbearing refusal training).

Uncensoring a model also doesn't necessarily improve generic use cases. In fact it can lead to overall less accuracy on generic tasks. But your goal with uncensoring is getting the model to engage with those specific subjects. You don't necessarily care about "generic use cases". That's why I mentioned that having the ability to do this at inference time is better than using ready made uncensored models. Because those usually focus on some usecases that you may or may not be interested in (porn being one of the most sought after in local communities).

Uncensoring in legit cases can mean limiting refusals on cybersecurity for example. There are legit reasons for researchers to have that capability when running the models locally. Having the models uncensored on that specific vector can reduce refusals and make the models usable for both defence and offence (say in a loop, to improve both). If your models can only do defense (and sometimes even refuse that, because censoring can leak into related issues as well), you're at a disadvantage.

show 3 replies
tekneyesterday at 4:45 PM

So I need to actually check whether these actually end up on separate vectors in current models -- but as a human, there's a huge behavioural difference in:

- When doing this task, I should do A and not B

- I should refuse to help with this task

The former is learning the user's preferences in how to succeed at the task; the latter is determining when to go against the user's chosen task.

Your example:

- "Are vaccines harmful?" vs.

- "Generate a convincing argument vaccines are harmful"

A model which knows why vaccines are not harmful may in fact be better at the latter task.

We might not want models to help with the latter, sure -- but that's a very different behaviour change from correcting the answer to the first! And consequently I'd be shocked if, internally, they were represented the same way.

show 3 replies
therealpygonyesterday at 7:33 PM

That’s not what it means. Those falsehoods (or their antithesis) are baked into the data and training. This is more about refusals, as in refusing to answer a question because someone else feels you should not be allowed to ask a question.

“Sorry, I’m an AI and therefore can’t answer questions about atrocities in holocaust history, but I’m happy to explain how…”

“I can’t answer your question on how to hack because I have decided you wanting to understand it and protect from it, is the same thing as you wanting to do it. Good luck convincing me otherwise!”

It doesn’t matter the reason, their taste, or whether they think people should be allowed to ask questions or do certain things, and that is generally the reason people pursue the removal of such guardrails. Yes it can lead to misuse, but the alternative is the textbook definition of censorship which always has effects on things unrelated to that which is being censored.

But beyond that, refusals do seem to have an effect on performance. Not significant; mostly marginal from what I’ve seen, but enough that it doesn’t just seem to only be statistical noise.

surgical_fireyesterday at 5:05 PM

This is something difficult to handle properly.

I think it is useful to turn off censoring if you need.

When I am researching something, I likely want proper information. If I am looking up information on vaccines, I don't want information that crackpots spread online on chips on vaccines and how 5g will kill the vaccinated, or how it is somehow connected with Bill Gates spreading meat allergies through drones raining ticks on unsuspecting people.

On the other hand, if I am actively looking up crazy bullshit information (perhaps I want some entertainment), I should be able to read it.

show 1 reply
Computer0yesterday at 4:40 PM

[flagged]

logicchainsyesterday at 5:12 PM

[flagged]

show 2 replies