logoalt Hacker News

logdahltoday at 10:58 AM7 repliesview on HN

Love Pratt parsing! Not a compiler guy, but I've spent way too many hours reflecting on parsing. I remember trying to get though the dragon book so many times and reading all about formal grammar etc. Until I landed on; recursive descent parsing + Pratt for expressions. Super simple technique, and for me is sufficient. I'm sure it doesn't cover all cases, but just for toy languages it feels like we can usually do everything with 2-token lookahead.

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)


Replies

erutoday at 12:29 PM

The Dragon book is not very good, to be honest.

It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.

Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.

For parsing: by default you should be using parser combinators.

show 3 replies
gignicotoday at 11:23 AM

Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.

show 1 reply
marssaxmantoday at 4:56 PM

I am a compiler guy, and I completely agree. Parsing is not that hard and not that important. Recursive descent + pratt expressions is almost always the practical choice.

randomNumber7today at 11:42 AM

It's not for toy languages. Most big compilers use recursive descent parsing.

show 1 reply
signa11today at 11:57 AM

> Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

exactly this ! a thousand times this !

show 1 reply
joe_the_usertoday at 6:49 PM

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

Well, it depends how formal you're talking about. I have to say that the standard you mention, recursive descent parsing + Pratt for expressions. actually requires you to understand what a formal language is - that it's a "thing" that can't (or shouldn't) be an object or a data structure but exists abstractly before any objects created by the program.

Moreover, the standard way of producing a recursive descend parser is to begin with your language in Chomsky normal form or some human understandable format and then convert to Greibach Normal form and that specification converts readily to your series of recursive functions. So all language transforms are useful to know (though you can skip steps if you have a good intuition of your language).

ogogmadtoday at 11:38 AM

Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.

The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.

Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.

show 1 reply