I don’t understand why there isn’t public dataset for reasoning that can be improved by humans/...

mirekrusin • yesterday at 7:51 PM • 2 replies • view on HN

I don’t understand why there isn’t public dataset for reasoning that can be improved by humans/llms like Wikipedia (ie with auto judging contributions etc).

Replies

woctordho • today at 6:47 AM

There is already a lot of effort to collect agent traces including reasonings, e.g. see the recent discussion: https://old.reddit.com/r/LocalLLaMA/comments/1u795pb/donate_...

We've been developing DataClaw for this: https://github.com/peteromallet/dataclaw

logicchains • yesterday at 8:21 PM

For reasoning a manually-curated dataset is too small; you need to be able to automatically generate vast volumes of synthetic reasoning data with provably correct answers. That's presumably why Claude and GPT are so good at using Lean (the theorem prover), because they get fed a bunch of synthetic, verifiably correct training data.

alt Hacker News

Replies