Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”.
I guess doing things twice can help catch errors, but I fail to see what’s so special about formal specs if they can suffer from the exact same bugs as the tests/implementation.
I guess the root of the problem is if you want to formally prove that a program does something, you have to be very specific (heh) about what that something is, at which point you are basically just writing tests/implementation all over again.
I have been looking into this on and off for years now, and I keep feeling like I’m missing something, but I don’t know what it is.
Can anyone enlighten me?
> Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”. [...] Can anyone enlighten me?
A big difference is that formal methods allow you to use the "for all" quantifier.
For example, you might write a unit test that says "foo('abc') returns a string with no trailing whitespace".
But with formal methods, you can prove that "for any input x, foo(x) returns a string with no trailing whitespace".
This is a trivial example, but you could imagine something more complicated, such as "for any program P, compile(P) has the same behavior as P".
Of course, you have to define what "has the same behavior as" means!
The book "Program = Proof" by Samuel Mimram starts with a formula which is true for all n below
n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889
I don't think you can catch it with any test suite.
It's more than just "write the same tests just in a different way", because each test is largely independent, or grows to unmanageably large, and you have to do the work of figuring out the completeness of your test suite by e.g. branch coverage or other relatively crude overlap methods.
Statically proven type systems let you do this in a way that each contributing piece is checked in advance against all the other pieces, guaranteeing not just "this test passes" but, in combination, "all these tests create a reasonable, coherent whole", and it disallows incoherent models from being implemented in the code. An example of this is like Rust's match, which requires complete coverage of all the possible branches, but writ large across an entire codebase.
You're right that it does nothing for you if you make a fundamental logic or theory error, it can't catch that kind of error, but you'd be surprised how much more robust it is than "merely" e.g. Haskell's type system + ad hoc functional testing + property testing - which I would consider a quite strong system overall, but formally proven e.g. cryptography is a much higher bar.
I don’t write software, but in asic and fpga design, formal method specifications enable the use of things like SAT solvers to determine if the underlying test article meets the specification.
You specify properties of the design and the tooling tests the design in a variety of ways to see if it can violate those properties.
Let’s say you have a system that controls turn signals. You can specify properties that say things like lanes that cross each other can’t both have a green light at the same time or even within some period of time of each other.
The tooling can exhaustively check the design for this and surface code traces that violate it.