logoalt Hacker News

tadamcztoday at 12:01 PM1 replyview on HN

> Eg cal is totally routine. I would expect most sophomores to be able to write a perfectly good cal.

This is incidental to the main disagreement, but btw I also doubt this.

Let's try and make the claim more precise. e.g. are you saying the average university undergraduate studying CS would reimplement cal from scratch (only stdlib), matching the output perfectly for all 1365 MirrorCode test cases, in (say) 3 days of full-time work (without AI assistance obviously)? I'd bet against it!

Here is the manual for the cal that we use: https://media.githubusercontent.com/media/epoch-research/Mir...

You can also look at a full transcript of an LLM solving the task: https://epochai-public-eval-logs-manual.s3.amazonaws.com/eva...

The data is here: https://github.com/epoch-research/MirrorCode-data/


Replies

LeCompteSftwaretoday at 1:32 PM

I didn't say "3 days of full-time work," that is totally unreasonable. I was giving them basically unlimited time to do whatever slow testing and research they needed. And let me qualify my statement: when I say "I would expect most sophomores to be able to do this," I mean "if most sophomores can't do this then their university is badly failing them." (If you want to split hairs about modern undergrads not learning C then I think this conversation is over.)

Of course it would take them a while to learn facts about datetime that the LLM doesn't need to learn. If your argument is about cost optimization then congrats, you win. The point is that it doesn't take a huge amount of C expertise to do this successfully - the standard implementation is nothing you wouldn't see in K&R: https://raw.githubusercontent.com/util-linux/util-linux/refs... It's routine.

But a nontrivial database, even a simple one like SQLite, really does require professional-level C expertise. It is not routine. So your comparison to ProgramBench still seems apple-to-oranges.

show 1 reply