I'd like to highlight a different part of the article:
> In general, when I talk to software folks about testing, I'm coming from such a different place that they immediately look at me like I'm an alien, so let's talk about how we tested at this hardware company I worked for, Centaur, which informs my biases about how I like to work. Some of the things that we did that were or are unorthodox in the software world are:
> Hired dedicated QA / test engineers, with testing being a first-class career path on par with being a developer - No code review by default - Virtually no hand-written tests - Constant testing via what programmers sometimes called property based testing, randomized testing, fuzzing, etc., although we just called those tests (hand-written tests were called "hand tests"). - Large regeression test suite (3 months wall clock to execute on compute farm) - No unit tests
Anybody here tried that (or a similar) approach? Especially going all-in on property based testing and fuzzing with no unit tests.
I tried that approach somewhere before and the initial results were promising, but ran into political issues so the idea was canned.
No code review by default goes against actual established evidence (there is little of this for software development practice) that code review is the best way to find defects.
I always get the impression from using hardware and other anecdotes like this that it is rare for hardware companies to know how to do software development well because their core competency is hardware. In fairness, it is uncommon for software companies to know how to do software development well.
Every form of testing has its value when done well and they are using several forms that most software developers don't use- probably helps make up for the lack of code review and unit tests. But if they incorporated code review and unit tests their software would likely be even higher quality.
Property based testing is amazing, but it won't provide full coverage. Regression suites are amazing, but generally the most expensive form of testing in terms of time to write and maintain tests and time to run them.
Today AI can crank out unit tests so its silly not to have them.
I really wonder what "randomixed testing" looks like in practice. What is the measure of success/failure?
I undrestand for fuzzing you have a very basic "doesn't crash" metric. Property based tests.... you gotta write properties for the PBTs to work on. What is the randomized testing hitting?
The first thing I started wondering was "is this the same Centaur that comes up as 'CentaurHauls' on a CPUID (EAX=0)?"
I bake MCP tools into everything now, including doing screenshots. Any LLM can run just about every function, including resizing the window. I just watched Fabel 5 do a full usability test on a new project, copying the release cycle for my agentic terminal, relaunching the app like 20 times as it went, ensuring the move to a built and signed release was working. It installed the program like 5 times (something I do daily multiple times).
I noticed map tiles were not working and started to tell it, but then all of a sudden they reappeared and checking the logs it had found the issue and autocorrected itself.
The key here is feedback loops and systems annealing.
As for Dan, my God I love this guy. Glad someone posted it!