I want to add something else to this. In the process of writing this, I also played with formal verification and formally verified the suggestion engine, which was a really nice side discovery.
The basic idea is to write a prove in Lean4 and then test both the production implementation (Haskell) and the Lean implementation against random inputs. Compare if the results are the same.
If that is the case -> you can be pretty sure the unproven production version is as correct as the proven lean version.
https://www.dev-log.me/formal_verification_in_any_language_f...