Finetuning takes little resources, the base model training is the slow and expensive part. Architect...

sleepyeldrazi • yesterday at 6:10 PM • 1 reply • view on HN

Finetuning takes little resources, the base model training is the slow and expensive part. Architecturally 3.5 models are identical to their 3.6 counterparts, that is why there is a consensus that those are probably finetunes and not re-trained from scratch, like you will se many people publish their own on huggingface.

Replies

genxy • yesterday at 6:49 PM

Understood, but look at their larger cadence over the years and the breadth of models. They are clearly not all finetunes. Meta for all its billions, doesn't have anything comparable.

➕ show 3 replies

alt Hacker News

Replies