For reasoning a manually-curated dataset is too small; you need to be able to automatically generate...

logicchains • yesterday at 8:21 PM • 1 reply • view on HN

For reasoning a manually-curated dataset is too small; you need to be able to automatically generate vast volumes of synthetic reasoning data with provably correct answers. That's presumably why Claude and GPT are so good at using Lean (the theorem prover), because they get fed a bunch of synthetic, verifiably correct training data.

Replies

mirekrusin • today at 1:30 PM

Wikipedia is a lot of data as well but we manage to do it, no?

alt Hacker News

Replies