logoalt Hacker News

ianbutleryesterday at 4:43 PM6 repliesview on HN

Sorry where are we seeing that it failed? It compiled multiple projects successfully albeit less optimized.

" It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).

It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.

The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler. The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce. "

For faffing about with a multi agent system that seems like a pretty successful experiment to me.

Source: https://www.anthropic.com/engineering/building-c-compiler

Edit: Like I think people don't realize not even 7 months ago it wasn't writing this at all.


Replies

pronyesterday at 5:38 PM

> where are we seeing that it failed?

Anthropic said the experiment failed to produce a workable C compiler:

- I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

- The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler.

(source: https://www.anthropic.com/engineering/building-c-compiler)

Software that cannot be evolved is dead software. That in some PR communications they misrepresented their own engineer's report is beside the point.

> It compiled multiple projects successfully albeit less optimized.

150,000x slower (https://github.com/harshavmb/compare-claude-compiler) is not "less optimised". It's unworkable.

> Like I think people don't realize not even 7 months ago it wasn't writing this at all.

There's no doubt that producing a C compiler that isn't workable and is effectively bricked as it cannot be evolved but still compiles some programs is great progress, but it's still a long way off of auonomously building production software. Can today's LLM do amazing things and offer tremendous help in software development? Absolutely. Can they write production software without careful and close human supervision? Not yet. That's not disparagement, just an observation of where we are today.

show 1 reply
josephgtoday at 4:29 AM

> Sorry where are we seeing that it failed?

Try it yourself.

I've been using claude to make a project over the last few weeks. Its written ~70k LOC to solve a complex problem. I've found that it can get surprisingly far in a 1-shot, but about 90% of the work I've had it do (measured in time and tokens) is cleaning up the junk it outputs in its first pass. I'm finding my claude sessions have a rhythm like this:

1. Plan and implement some new feature.

2. Perform a code review of what you just did. Fix obvious problems. Flag bugs, issues, poor factoring, messy abstractions, etc. Make a prioritised list of things to fix (then fix them).

3. (Later) fixes:

- Write tests for the code you wrote and fix the bugs you find.

- Run the code through memory leak checks, and fix bugs.

- Do a performance analysis using benchmarks and profiling tools, and make any high priority performance improvements.

- Read the whole program, looking for ways in which the code you've just written could fit in better with the rest of the program. Fix any issues.

- In directory X is the full documentation for the library you're using. Reread it then review the code you wrote. Are there better ways we could make use of the library?

And so on.

Claude's 1-shot output is often usable, but its consistently chock full of problems. Bugs. Memory leaks. Bad factoring. Too many globals. Poor use of surrounding code. And so on. Its able to fix many of these problems itself if you prompt it right. (Though even then the code is often still pretty bad in many ways that seem obvious to me).

At the moment I think I'm spending tokens at about a 1:9 ratio of feature work to polish. Maybe its 1-shot output is good enough quality for you. To me its unacceptable. Maybe a few models down the line. But its not there yet.

show 1 reply
nvme0n1p1yesterday at 6:40 PM

Why are you quoting from their marketing blog as if it's a reliable source?

https://github.com/anthropics/claudes-c-compiler/issues/1

> Apparently compiling hello world exactly as the README says to is an unfair expectation of the software.

dnauticsyesterday at 5:01 PM

Yeah I think people are really underestimating what LLMs can do even without specs.

As an example, I did an exploratory attempt to add custom software over some genuinely awful windows software for a scientific imaging station with a proprietary industrial camera. Five days later Claude and I had figured out how to USB-pcap sample images and it's operationalized and smoothly running for months now. 100% of the code written by Claude, it's all clean (reviewed it myself) pretty much all I did was unstuck it at a few places, "hey based on the file sizes it looks like the images are being sent as a 16-bit format")

For day to day work, I'll often identify a bug, "hey, when I shift click on this graphical component, it's not doing the right thing". I go tell Claude to write a RED (failing) integration test, then make it pass.

Zero lines of code manually written. Only occasionally do I have to intervene and rearchitect. Usually thus involves me writing about ten lines of scaffold code, explaining the architectural concept, and telling it to just go

show 1 reply
jyounkeryesterday at 9:43 PM

By "non-workable" I think people mean that it won't compile Hello World.

YZFyesterday at 7:03 PM

GCC has only like a billion man hours in it?

Assembler and linker are not part of a compiler. They are separate tools. They are also generally much simpler.