logoalt Hacker News

deskamessyesterday at 1:20 PM6 repliesview on HN

I always wondered why AST's were not more of a part in both editing and scoping of changes/parsing code. I thought I read an article where they said 'grep' was just as effective. It kinda made sense for the case they were talking about.


Replies

miki123211yesterday at 9:53 PM

I think we should use ASTS more, not for performance, but for easier code review.

Changes that are primarily code refactorings, like breaking up a large module into a bunch of smaller ones, or renaming a commonly-used class, are extremely tedious to review, both in LLM generated diffs and human-written PRs. You still have to do it; LLMs have a habit of mangling comments when moving code across files, while for a human, an unassuming "rename FooAPIClient to LegacyFooAPIClient" PR is the best place to leave a backdoor when taking over a developer's account. Nevertheless, many developers just LGTM changes like this because of the tedium involved in reviewing them.

If one could express such changes as a simple AST-wrangling script in a domain-specific language, which would then be executed in a trusted environment after being reviewed, that would decrease the review burden considerably.

I believe that with agentic development, the most important constraint we have is human time. Making the LLM better and faster won't help us much if the human still needs to spend a majority of their time reading code. We should do what we can to give us less code to read, without losing confidence in the changes that the LLM makes.

GodelNumberingyesterday at 1:27 PM

Grep is effective for the most part, except for situations like when you have huge codebases and the thing you're looking for is used in too many places both as symbol and non-symbol.

Another annoying thing about plain grep is, LLMs often end up pulling in bundled packages when using grep where 1 line is large enough to ruin the context window

show 1 reply
sigbottleyesterday at 4:15 PM

It's not intuitive to humans, even after learning parsing theory. I can do basic name refactorings. I've even written neovim plugins to do 1 specific thing with the AST (dfs down and delete one subtree which I understand). Those are fine.

I would not be comfortable doing an on-the-fly "rewrite all subtrees that match this pattern" kind of edit.

It seems like a tool that's good for LLM's though.

show 1 reply
lukeundtrugyesterday at 7:43 PM

Happened to have written both a tool and a blog post about the topic. It’s more about the different technical approaches you have in solving the problem but it might still interest you :)

https://www.context-master.dev/blog/deterministic-semantic-c...

Let me know, what you think

show 1 reply
jwryesterday at 7:55 PM

I just realized that the fact that LLMs work so well for me in Clojure might be partly because of the clojure-mcp tools. They provide structural browsing and editing.

tmztyesterday at 7:52 PM

Has anybody thought about encoding AST tokens as LLM tokens, similar to how different words can have different meanings and that's reflected in their embedding?

show 1 reply