The cause of this is that the cost of creating plausible code contributions has gone down, so PR proposals can multiply, but flaws still threaten project security and LLMs can be confidently wrong. So human review is needed right now to maintain the integrity of the project, but it takes time and costs money. Ladybird's developers, and we as a community, can't easily evaluate "this is what we want" vs. "this is not what we want" without manual review, because we haven't settled upon a reliable representation of the meaning of our code and its side effects that is time-efficient, secure, and meaningfully interpretable at scale.
This is partly due to Ladybird building on low-level system-language primitives that make it harder to identify problems, and while they are porting to Rust it's not fair to say that C++ is single-handedly the cause of this, because regardless of the language, in a complicated interconnected codebase the complexity easily compounds. It's a real shame we don't have the option of a trust-graph filter stop-gap that can filter contributors with a social model of who is trusted for what, purely as a heuristic to reduce the risk of bad contributions (not as solid proof of soundness).
This whole situation shows the way that development has been done isn't nearly as transparent as just having the source code being available.
We haven't been able to say what we want the code to do in a way that can be tested robustly enough to make openly accepting contributions sustainable, and it's unfair to blame the team for that because on top of needing to develop and review their own changes, it's an incredibly difficult problem with only so many hours in the day. I hope we figure out the representation and social trust graph problems, and that people continue to build on their great work.
Bad actors pay good money for vulnerabilities and patient actors are invested in slowly introducing them. Agent loops like Codex or Claude, with Anthropic's Mythos model finding ~271 Firefox 0-days, and helping fix them shows both the problem and the promise.
It's bitter-sweet in a way that Ladybird is great at showing how the incidental complexity of web browsers could be vastly reduced. To protest being gagged, cryptographers made t-shirts with DeCSS DVD or RSA algorithms on them. Alan Kay suggests that t-shirt computing is actually a useful target, and STEPS by his Viewpoints Research Institute managed to really distill some parts of OS-level and desktop publishing software down into minimal, more understandable abstractions that encode the rules of the programs with more appropriate patterns for the problems at hand, that might more plausibly fit on a small wardrobe of t-shirts. Browsers really need this range of t-shirts making.
As a minority browser user (and someone wanting to build on them), I'm excited to see Ladybird get increasingly usable for real browsing, and I am hopeful that in time, the spec representation gaps, and social trust map heuristics are solvable problems that could restore the dream of open-source, or at least stop a trend of closing (with tldraw doing this much earlier, for a less risky but still thorny project).
Full visibility of the source code is very different from having full legibility of the system: comprehensibility is the bottleneck.
(Seems I ran out of edit time!)