logoalt Hacker News

renehszlast Thursday at 11:36 PM1 replyview on HN

Wow, hats off to you! This is one of the most impressive solo projects I've seen in a while!

Making a toy C compiler isn't rocket science, but developing one that's complete and production-ready is whole nother story. AFAICT, Kefir fits into the latter category:

- C17/C23 compliance

- x86_64 codegen

- debug info gen

- SSA-based optimization passes (the most important ones)

- has a widely-compatible cc cli

- is extensively tested/fuzzed

Some advantages compared to the big three (GCC, Clang, MSVC):

- It's fairly small and simple

- That means it's understandable and predictable - no surprises in terms of what it can and cannot do (regarding optimizations in particular)

- Compilation is probably very fast (although I haven't done any benchmarking)

- It could be modified or extended fairly easily

There might be real interest in this compiler from people/companies who

- value predictability and stability very highly

- want to be in control of their entire software supply chain (including the build tools)

- simply want a faster compiler for their debug builds

Think security/high assurance people, even the suckless or handmade community might be interested.

So it's time to market this thing! Get some momentum going! It would be too sad to see this project fade away in silence. Announce it in lots of places, maybe get it on Compiler Explorer, etc. (I'm not saying that you have to do this, of course. But some people could genuinely benefit from Kefir.)

P.S. Seems like JKU has earned its reputation as one of the best CS schools in Austria ;-)


Replies

jprotopopovyesterday at 5:29 PM

(I thought that the announcement has completely faded, so haven't even checked the replies).

I'll immediately reveal some issues with the project. On the compilation speed, it is unfortunately atrocious. There are multiple reasons for that:

1. Initially the compiler was much less ambitious, and was basically generating stack-based threaded code, so everything was much simpler. I have managed to make it more reasonable in terms of code generation (now it has real code generator, optimization pipeline), but there is still huge legacy in the code base. There is a whole layer of stack-based IR which is translated from the AST, and then transformed into equivalent SSA-based IR. Removing that means rewriting the whole translator part, for which I am not ready.

2. You've outlined some appealing points (standard compliance, debug info, optimization passes), but again -- this came at the expense of being over-engineered and bloated inside. Whenever, I have to implement some new feature, I hedge and over-abstract to keep it manageable/avoid hitting unanticipated problems in the future. This has served quite well in terms of development velocity and extending the scope (many parts have not seen complete refactoring since the initial implementation in 2020/2021, I just build up), but efficiency of the compiler itself suffered.

3. I have not particularly prioritized this issue. Basically, I start optimizing the compiler itself only when something gets too unreasonable (either, in terms of run time, or memory). There are all kinds of inefficiencies, O(n^2) algorithms and such simply because I knew that I would be able to swap that part out should that be necessary, but never actually did. I think the compiler efficiency has been the most de-prioritized concern for me.

Basically, if one is concerned with compilation speed, it is literally better to pick gcc, not even talking about something like tcc. Kefir is abysmal in that respect. I terms, of code base size, it is 144k (sans tests, 260k in total) which is again not exactly small. It's manageable for me, but not hacker-friendly.

With respect to marketing, I am kind of torn. I cannot work on this thing full time, unless somebody is ready to provide sufficient full-time funding for myself and also other expenses (machines for tests, etc). Otherwise, I'll just increase the workload on myself and reduce the amount of time I can spend actually working on the code, so it'll probably be net loss for the project. Either way, for now I treat it closer to an art project than a production tool.

As for compiled code performance, I have addressed it here https://lobste.rs/s/fxyvwf -- it's better than, say, tcc, but nowhere near well-established compilers. I think this is reasonable to expect, and the exact ways to improve that a bit are also clear to me, it's only question of development effort

P.S. JKU is a great school, although by the time I enrolled there the project has already been on the verge of bootstrapping itself.

EDIT: formatting