logoalt Hacker News

ehntotoday at 8:08 AM1 replyview on HN

They are also not getting the same quantity or quality of data as was possible in the first years of "ingest". Compared to the beginning, from here on it is more like a drip feed of new training data. Still immense volumes of data, but we are talking 1 year of data production from society versus centuries of text and data ingested in a short time frame.


Replies

nayrocladetoday at 9:37 AM

For pre-training, yes. But for post-training you need high-quality labelled datasets for reinforcement learning. So far AI has been most successful in coding, because you can translate the usage into such datasets, and thus produce a virtuous cycle: More usage produces more data, which produces better models, which drives more usage.

The question is whether this same model can successfully be applied in disciplines like medicine, law, engineering, etc.