logoalt Hacker News

sievetoday at 1:44 AM7 repliesview on HN

Anyone trying to do this... the first thing you do is avoid lex/yacc/bison/antlr. You do not need all this ceremony. A recursive descent parser that uses Pratt parsing will work for a vast majority of cases.

The lexer/parser is never the bottleneck. In fact, you can write those two by hand over a single weekend for a largish language. With LLMs, it takes 15 minutes if you have an unambiguous spec.

The biggest time sink, and the reason you will fail for sure, is the inability to restrict the scope of the project. You start with a limited feature set and produce the entire compiler/vm toolchain. Then you get greedy and fiddle with the type system, adding features that you have never used and probably never will. And now you have to change every single phase from start to end.

I mostly give up at this stage.


Replies

wg0today at 2:40 AM

Jonathan Blow wrote his own game enginee and for that he wrote his own programming language.

He went through straight recursive descendant parser and said same thing.

I think compiler courses teach from yacc, bison etc that's where this whole thing came from but in practice people discovered that hand written recursive descendant parsers are all you need.

show 1 reply
MattPalmer1086today at 9:49 AM

Yep. I started out using ANTLR for one project of mine. I ended up spending loads of time fighting its syntax to do really quite simple things, and it was slow! I probably wasn't holding it right. In the end, I wrote a simple lexer and recursive descent parser (with a small amount of lookahead) in a weekend. The code was easy to read, easy to extend, and fast.

barnabeetoday at 8:25 AM

Probably the most fun I’ve had with LLMs has been slowly making a programming language as a side project.

I used to give up somewhere around the type system, too, but this time I’m approaching something vaguely useful. It even has a basic LSP.

It’s been both enjoyable and enlightening, and LLMs turn out to be an excellent pair designer as (in addition to implementation) they’re really good at summarising the impact of various decisions.

> the reason you will fail for sure, is the inability to restrict the scope of the project

This will be the reason, for sure. But then the scope of every project like this tends towards building an OS with it then replacing every piece of software, including all embedded devices :)

show 1 reply
pan69today at 3:51 AM

I learned to do this about 2 years ago (pre LLM). I have been developing software for ~30 years and somehow doing something like this was a major mental obstacle, mostly created by the perception of "the dragon book", as in this topic being full of mystical unobtainable incantations, so I never even dared venture into this space. Silly, I know. However, after diving into this and learning to write a recursive descent parser for a DSL I wanted to write, it felt like I'd acquired a superpower. Totally understand that there is many more layers to all of this, layer that can get very complex, but just learning that first bit...

show 1 reply
zelphirkalttoday at 8:27 AM

Many projects wish they had a proper grammar. When a project turns useful and people want to port it, or support it on other platforms, a grammar makes that job much easier.

I am not quite sure what you mean by having a recursive descent parser, because you can write one manually, or you can generate one from a grammar, which would have the additional benefits of having a grammar. I recommend having a grammar.

show 1 reply
Panzerschrektoday at 5:54 AM

I agree. I have written lexer/parser for my language twice (for compiler0 and for a self-hosted compiler). It's a very dumb task requiring almost to mental load.

Profiling results show that the amount of time spent lexing/parsing is negligible - less than 1% of the total compilation time.

true_religiontoday at 2:14 AM

I wrote a few of these due to an interest in compilers and hardware.

The easiest syntax to copy if you’re looking for a high level language is Smalltalk.

But most of the time, I wouldn’t even use that. Simple imperative languages that look like BASIC works pretty well in most domains. If you simplify the syntax a little, it’s very easy to understand the compiler and use it for say when you want users to input code into existing systems.

show 1 reply