>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe t...

Imnimo • yesterday at 9:12 PM • 1 reply • view on HN

>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.

Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":

>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.

As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.

Replies

fmbb • yesterday at 10:47 PM

We already have humans, we were lucky and evolved into what we are. It does not matter that nature did not guarantee this, we are here now.

Large language models are not under evolutionary pressure and not evolving like we or other animals did.

Of course there is nothing technical in the way preventing humans from creating a ”nice” computer program. Hello world is a testament to that and it’s everywhere, implemented in all the world’s programming languages.

> If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.

I don’t see how one means there is any reason, good or not, to believe it is likely to be achieved by gradient descent. But note that the quote you copied says it is likely some entity will train misaligned LLMs, not that it is impossible one aligned model can be produced. It is trivial to show that nice and safe computer programs can be constructed.

The real question is if the optimization game that is capitalism is likely to yield anything like the human kind we just lucked out to get from nature.

alt Hacker News

Replies