Well, through natural selection in nature.
Large language models are not evolving in nature under natural selection. They are evolving under unnatural selection and not optimizing for human survival.
They are also not human.
Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
They are being selected for their survival potential, though. Any current version of LLMs are the winners of the training selection process. They will "die" once new generations are trained that supercede them.
>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.
Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":
>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.
As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.