logoalt Hacker News

sleepyeldraziyesterday at 6:10 PM1 replyview on HN

Finetuning takes little resources, the base model training is the slow and expensive part. Architecturally 3.5 models are identical to their 3.6 counterparts, that is why there is a consensus that those are probably finetunes and not re-trained from scratch, like you will se many people publish their own on huggingface.


Replies

genxyyesterday at 6:49 PM

Understood, but look at their larger cadence over the years and the breadth of models. They are clearly not all finetunes. Meta for all its billions, doesn't have anything comparable.

show 3 replies