logoalt Hacker News

xpcttoday at 4:10 AM1 replyview on HN

Sure, but now we have to remodel whatever bias we want for our use case with every new release because the system prompt changes, whereas the underlying data does not.


Replies

stingraycharlestoday at 6:10 AM

Underlying data changes all the time, as do training methodologies / preferences.

You do realize that these LLMs are trained with a metric ton of synthetic examples? You describe the kind of examples / behavior you want, let it generate thousands of examples of this behavior (positive and negative), and you feed that to the training process.

So changing this type of data is cheap to change, and often not even stored (one LLM is generating examples while the other is training in real-time).

Here's a decent collection of papers on the topic: https://github.com/pengr/LLM-Synthetic-Data