logoalt Hacker News

ketchup32613today at 1:15 AM1 replyview on HN

Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?


Replies

zxexztoday at 1:40 AM

Not to sound facetious, but perhaps enough runs at different param/token sizings to define a curve?