This should be the real benchmark of AI coding skills - how fast do we get safe/modern infrastructure/tooling that everyone agrees we need but nobody can fund the development.
If Anthropic wants marketing for Mythos without publishing it - show us servo contrib log or something like that. It aligns nicely with their fundamental infrastructure safety goals.
I'd trust that way more than x% increase on y bench.
Hire a core contributor on Servo or Rust, give him unlimited model access and let's see how far we get with each release.
> show us servo contrib log or something like that
Servo may not be the best project for this experiment, as it has a strict no-AI contributions allowed policy.
The problem with such infrastructure is not the initial development overhead.
It's the maintenance. The long term, slow burn, uninteresting work that must be done continually. Someone needs to be behind it for the long haul or it will never get adopted and used widely.
Right now, at least, LLMs are not great at that. They're great for quickly creating smaller projects. They get less good the older and larger those projects get.
Replicating Chromium as a benchmark? ;)
Replicating Rust would also be a good one. There are many Rust-adjacent languages that ought to exist and would greatly benefit mankind if they were created.
The true solution to this is to fund things that are important, especially when billion-dollar companies are making a fortune from them.
Perhaps, you know, not every thing, especially not every thread on HN, has to be about AI?
I read the link twice and no AI or LLM mentioned. I don't know why people are so eager to chime in and try to steer the conversation towards AI.
Agreed. Which other software does society need badly?
Oh good, I was worried for a sec that people wouldn't be talking about AI in this thread.
We do not need vibe-coded critical infrastructure.