I think you could still do it against a real database—you're already setting it up to a known state before each test, right? Obviously there'd be more runs but I'd expect (hope) that each task would be sufficiently small that the number of permutations would stay within reason.
There would be some challenges for sure. Likely optimistic concurrent patterns would require an equivalent of loom's `yield_now` [1] to avoid getting stuck. And you'd probably need a way to detect one transaction waiting for another's lock to get out of situations like your update lock vs barrier example. I vaguely recall PostgreSQL might have some system catalog table for that or something.
Yeah, the more I think about it, the more exciting this idea gets. The walkthrough in the article shows exactly why - I intentionally (to later show why that is wrong) place the barrier between the SELECT and UPDATE, which deadlocks instead of triggering the race. Getting the placement right requires reasoning about where the critical interleaving point is. An exhaustive approach would surface both outcomes automatically: this placement deadlocks, this one exposes the bug, this one passes. That would remove the hardest part of writing these tests.