>Testing is the first layer of defense. My system now includes 1,300+ tests — from unit tests to minimal integration tests (e.g., proposer + acceptor only), all the way to multi-replica full integration tests with injected failures. See the project status.
I know LOC is a silly metric, but ~1300 tests for 130k lines averages out to a test per 100 lines - isn't this awfully low for a highly complex piece of code, even discounting the fact that it's vibecoded? 100 LOC can carry a lot of logic for a single test, even for just happy paths.
I'm also shifting to an vibe coding workflow, but I have a genuine question: whenever I use AI for Rust, it makes an insane amount of lifetime errors. I have no idea how people are churning out so many lines of code so quickly.
Honestly, despite all the hype around Rust in the community, the fact that AI can't handle lifetimes reliably makes me reluctant to use it. The AI constantly defaults to spamming .clone() or wrapping things in Rc, completely butchering idiomatic Rust and making the output a pain to work with.
On the other hand, it writes higher-level languages better than I do. For those succeeding with it, how exactly are you configuring or prompting the AI to actually write good, idiomatic Rust
How many of those tests have you actually read yourself if all of them are generated by AI (also when you're sleeping) ?
This is from 2025 - I would like to see an update now how that system turned out to be after the vibe hype
To me, the real question after reading this, is: Is your new implementation of Azure’s RSL now being used?
If it is, and it works well, then to me this is far more meaningful than the fact that AI wrote 130K lines of code.
Contrarian view: Why English will never be a programming language. https://www.slater.dev/2026/05/why-english-will-never-be-a-p...
I am having a different experience than a lot of other commenters here vibe coding with Rust. I am not a Rust programmer or evangelist. I have implemented a drop-in Bash replacement/clone in Rust that passes the upstream Bash test suite and a whole battery of its own. It is a tiny bit faster than Bash itself but consumes a bit more memory. But Codex and Claude both did a great job with it.
I also had it implement a wasm geodesic calculator in Rust and it's amazing and in my use case is better than geodesiclib using the same updated algorithm.
I'm a "C-nile" Rust folks love to hate and did my first hacking in C Deep Blue C on Atari 8-bits. But I'm very impressed with these products and with the ability to leverage some features of Rust with them. (e.g. audit every unsafe instance and define its invariants, etc.)
I also agree with the commenter who said these LLMs are today, at the present moment, good at Go. The only language I notice it seems to be really good above and beyond others at is javascript, I assume because there's so much of it.
It's almost guaranteed with agents you could do the same job with less than half of 100k lines. I don't know whats impressive in lines of code generated by agent.
The thing that impresses me most is that the author knows everything (from the high level architecture to the small details) of "multi-Paxos consensus engine" (I have no idea what it is, but it must be very complicated) and can write everything out for AI to read (or did he/she use an app to convert speech to text)?
Cool post. I don’t fully understand what a code contract is but appreciate the advice. I have settled on a similarly light weight /agile folder when I keep my roadmap.md with epics and sprints.
This is great example of AI slop and a big problem with AI coding.
Original RSL library has 36 KLoC across C++ source and headers files. Rust supposed to be more expressive and concise. Yet, AI generated 130k LoCs. I guess nobody understands how this code works and nobody can tell if it actually works.
Paxos is certainly non-trivial in the sense that tiny changes can break it, but in terms of functionality it is not that big. 50 KLOC just seems like a lot of code to me.
The moment a language is the output of a natural language compiler, the language itself is kind of irrelevant.
Change the skills, ask the agent to do exactly the same in something else.
I am slowly focusing on agent orchestration tools, which make the actual programming language as relevant as doing SOA with BPEL.
I've found Rust's safety guarantees to be less useful for slop-generated code because LLMs can always fight their way through the borrow checker by spamming enough Arc<Mutex<Arc<Mutex<...>>>> and clone() everywhere. Rust only gives you safety properties, not liveness. Interior mutability is a fantastic tool for turning safety failures into liveness failures. Remember kids: deadlock is a safe outcome.
It works for humans because when we get a borrow-check failure, we take a step back and think about the global shape of our code and ownership. LLMs path straight to the goal. Problem: code doesn't compile. Solution: more clone()
Is the idea of the runtime contracts similar to the idea of runtime validation? Or are they different in some way?
I have Tarpaulin code coverage check and everytime that it drops below the treshold Claude gives up quickly and just lowers the threshold. I don't know how to overcome it. CLAUDE.md neither AGENTS.md help but the LLM always finds its way.
How are you keeping the requirement, design, and tasks docs in sync as the code evolves? I'm curious if anyone's landed on a good workflow for this.
Rust code generation consumes lot of token
Go is much better target, i've observed rails/ruby code is also much easier for AI to spit out.
And Haskell flies with AI
We MUST get programming languages and LLMs that do not ever change or break comments.
You can’t have contracts defined in comments in code because there’s no guarantee they won’t be deleted or changed.
Even better, we need the ability to embed directives to LLMs which are NOT comments, but a type of programming construct specifically for this purpose.
Rust is about abstractions more than code. You can ask AI to "Optimize/Test/Clarify" but at the end of the day you should be willing to blindly agree to it's output or spend more time reviewing someone else's code.
Where can we read the code?
[flagged]
[flagged]
[flagged]
[flagged]
[dead]
[flagged]
Seriously, "Learnings"? Learn you an English.
We're working on a large Rust codebase, heavily assisted development with Claude and Codex, and one critical workflow is after you have written a spec, have the other LLM critique it thoroughly.
This back and forth will take quite a while, but the resulting implementation plan will be 10x better than the original.
You can automate this by giving Codex a goal, and a skill to call Claude to review the implementation spec until they both agree it's done.
Then, for critical code, have them both implement the spec in a worktree, then BOTH critique each other's implementation.
More often than not, Claude will say to take 2 or 3 pieces from it's design over to Codex, but ship the Codex implementation.