> code that does not compile. Why should I share that?
If you collect test cases for compilers, for example.
> tree-sitter sort of handles that
My worry is that stability of committed ASTs would depend on tree-sitter being stable, and it might be difficult to guarantee that for languages are still in flux. Even most well established languages gain new grammar once every few years, sometimes in backward incompatible ways.
Maybe you meant tree-sitter itself will also be versioned inside this repository?
Tree-sitter can parse somewhat-bad code.
Also, there is an option to pick a codec for a particular file. Might use tree-sitter-C, might use general-text. The only issue here, you can't change the codec and keep nice diffs.
So, these cases are handled.