Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?

delis-thumbs-7e • today at 4:39 PM • 1 reply • view on HN

colechristensen • today at 4:42 PM

No, training a state of the art model involves training on the order of 10 trillion tokens.

We're talking about a step that updates weights based on say between 10k and 1M tokens.

➕ show 1 reply

alt Hacker News