logoalt Hacker News

deepsunlast Sunday at 10:55 PM1 replyview on HN

Why not compiling it to Java source code (not bytecode)? Users would use their own Java compiler then.

Same as, say, ANTLR generates code to parse various texts to AST.


Replies

exabrialyesterday at 12:27 AM

Great question, actually I tried that! m2cgen is a project that does that in fact.

It works fine for simple models, but breaks down for production-sized tree ensembles. The JVM has a hard 64KB method size limit, and javac controls how your deeply nested if/else trees get laid out. m2cgen's own FAQ says to reduce estimators when you hit recursion limits during generation. With direct bytecode emission I control the method structure precisely, I can split across methods exactly where needed and manage the constant pool directly. I also wrote much more efficient bytecode than m2cgen creates as equivalent source.

The source code is also a pretty useless step, sets off all kinds of static analysis alarms in your stack, and also I worry about source code injection (not that can't happen with petrify, it's just a lot harder).

Finally, I'm grateful for the sweat the authors of m2cgen have put in, but the project has gone without updates for 4 years. That doesn't mean it's useless (some mature software never sees updates), but it's not a positive sign either.