I can recommend anyone reading pratts original paper. Its written in a very cool and badass style.
"I’ve read many articles on the same topic but never found it presented this way" it reminds me a lot of a description I saw in a video with Jonathan Blow talking about precedence and parsing with Casey Muratori.
The video is 3 hours long though, and I'm not sure the text he shows is available.
At this point he's talking about left leaning vs right leaning trees, after having already talked about one of them: https://youtu.be/fIPO4G42wYE?t=2256&si=aanthLGe-q8ntZez
The latest implementation of Picol has a Tcl-alike [expr] implemented in 40 lines of code that uses Pratt-style parsing: https://github.com/antirez/picol/blob/main/picol.c#L490
> I’ve read many articles on the same topic but never found it presented this way - hopefully N + 1 is of help to someone.
Can confirm; yes it was helpful! I've never thought seriously about parsing and I've read occasionally (casually) about Pratt parsing, but this is the first time it seemed like an intuitive idea I'll remember.
(Then I confused myself by following some references and remembering the term "precedence climbing" and reading e.g. https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm by the person who coined that term, but nevermind — the original post here has still given me an idea I think I'll remember.)
You can either use the stack in an intuitive way, or you can change the tree directly in a somewhat less intuitive way without recursion. Essentially either DF or BF. I don’t see how it matters much anymore with stacks that grow automatically, but it’s good to understand.
An even simpler way imo, is explicit functions instead of a precedence table, then the code pretty much has the same structure as EBNF.
Need to parse * before +? Begin at add, have it call parse_mul for its left and right sides, and so on.
parse_mul() {
left = parse_literal()
while(is_mul_token()) { // left associative
right = parse_literal()
make_mul_node(left, right)
}
}
parse_add() {
left = parse_mul()
while(is_add_token()) { // left associative
right = parse_mul()
make_add_node(left, right)
}
}
Then just add more functions as you climb up the precedence levels.Also discussed in detail in Crafting Interpreters.[0]
Also if you're looking into this area you'll find there is another algorithm called "Precedence climbing", which is really the same thing with some insignificant differences in how precedence is encoded.
There's also the "shunting yard" algorithm, which is basically the iterative version of these algorithms (instead of recursive). It is usually presented with insufficient error checking, so it allows invalid input, but there's actually no reason you have to do it like that.
I will never forget the amusing attention I got from the professor when this topic was covered during my undergrad. It's only happened once, sadly, but this is only seconded by the time I was assisting a junior engineer with a related problem and was able to say "Oh, that's just a Pratt Parser. Let me show you."
Love Pratt parsing! Not a compiler guy, but I've spent way too many hours reflecting on parsing. I remember trying to get though the dragon book so many times and reading all about formal grammar etc. Until I landed on; recursive descent parsing + Pratt for expressions. Super simple technique, and for me is sufficient. I'm sure it doesn't cover all cases, but just for toy languages it feels like we can usually do everything with 2-token lookahead.
Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)