mmm. interesting and fun concept, but it seems to me like the text is actually the right layer for storing and expressing changes since that is what gets read, changed and reasoned about. why does it make more sense to use asts here?
are these asts fully normalized or do (x) and ((x)) produce different trees, yet still express the same thing?
why change what is being stored and tracked when the language aware metadata for each change can be generated after the fact (or alongside the changes)? (adding transform layers between what appears and what gets stored/tracked seems like it could get confusing?)
100% agree. I think AST-driven tooling is very valuable (most big companies have internal tools akin to each operation Beagle provides, and Linux have Coccinelle / Spatch for example), but it's still just easier implemented as a layer on top of source code than the fundamental source of truth.
There are some clever things that can be done with merge/split using CRDTs as the stored transformation, but they're hard to reason about compared to just semantic merge tools, and don't outweigh the cognitive overhead IMO.
Having worked for many years with programming systems which were natively expressed as trees - often just operation trees and object graphs, discarding the notion of syntax completely, this layer is incredibly difficult for humans to reason about, especially when it comes to diffs, and usually at the end you end up having to build a system which can produce and act upon text-based diffs anyway.
I think there's some notion of these kinds of revision management tools being useful for an LLM, but again, at that point you might as well run them aside (just perform the source -> AST transformation at each commit) rather than use them as the core storage.
One nice thing about serializing/transmitting AST changes is that it makes it much easier to to compose and transform change sets.
The text based diff method works fine if everyone is working off a head, but when you're trying to compose a release from a lot of branches it's usually a huge mess. Text based diffs also make maintaining forks harder.
Git is going to become a big bottleneck as agents get better.
Having a VCS that stores changes as refactorings combined with an editor that reports the refactorings directly to the VCS, without plain text files as intermediate format, would avoid losing information on the way.
The downside is tight coupling between VCS and editor. It will be difficult to convince developers to use anything else than their favourite editor when they want to use your VCS.
I wonder if you can solve it the language-server way, so that each editor that supports refactoring through language-server would support the VCS.
> why does it make more sense to use asts here
For one, it eliminates a class of merge conflict that arises strictly from text formatting.
I always liked the idea of storing code in abstraction, especially editors supported edit-time formatting. I enjoy working on other people's code, but I don't think anybody likes the tedium of complying with style guides, especially ones that are enforced at the SCM level, which adds friction to creating even local, temporary revisions. This kind of thing would obviate that. That's why I also appreciate strict and deterministic systems like rustfmt. Unison goes a little further, which is neat but I think they're struggling getting adoption because of that, even though I'm pretty sure they've got some better tooling for working outside the whole ecosystem. These decoupled tools are probably a good way to go.
I was messing around with a file-less paradigm that would present a source tree in arbitrary ways, like just showing a individual functions, so you have the things you're working on co-located rather than switching between files. Kind of like the old VB IDE.