logoalt Hacker News

XenophileJKOyesterday at 7:18 PM2 repliesview on HN

Hmm.. I looked at the benchmark set.

I'm conflicted. I don't know that I would necessarily want a model to pass all of these. Here is the fundamental problem. They are putting the rules and foundational context in "user" messages.

Essentially I don't think you want to train the models on full compliance to the user messages, they are essentially "untrusted" content from a system/model perspective. Or at least it is not generally "fully authoritative".

This creates a tension with the safety, truthfulness training, etc.


Replies

trevwilsonyesterday at 8:51 PM

Sure, but the opposite end of the spectrum (which LLM providers have tended toward) is treating the training/feedback weights as "fully authoritative", which comes with its own questions about truth and excessive homogeneity.

Ultimately I think we end up with the same sort of considerations that are wrestled with in any society - freedom of speech, paradox of tolerance, etc. In other words, where do you draw lines between beneficial and harmful heterodox outputs?

I think AI companies overly indexing toward the safety side of things is probably more correct, in both a moral and strategic sense, but there's definitely a risk of stagnation through recursive reinforcement.

show 1 reply
Orasyesterday at 8:43 PM

Isn’t that what fine tuning does anyway?

The article is suggesting that there should be a way for the LLM to gain knowledge (changing weights) on the fly upon gaining new knowledge which would eliminate the need for manual fine tuning.