I don’t understand why there isn’t public dataset for reasoning that can be improved by humans/llms like Wikipedia (ie with auto judging contributions etc).
For reasoning a manually-curated dataset is too small; you need to be able to automatically generate vast volumes of synthetic reasoning data with provably correct answers. That's presumably why Claude and GPT are so good at using Lean (the theorem prover), because they get fed a bunch of synthetic, verifiably correct training data.
There is already a lot of effort to collect agent traces including reasonings, e.g. see the recent discussion: https://old.reddit.com/r/LocalLLaMA/comments/1u795pb/donate_...
We've been developing DataClaw for this: https://github.com/peteromallet/dataclaw