logoalt Hacker News

mrkeentoday at 11:27 AM3 repliesview on HN

I call it the "doing two things" problem.

Your write imperative code, which issues two commands, both of which can fail independently.

There are plenty of ways to pretend to 'deal with it'.

Firstly it will just pass all tests, so most devs can stop thinking about it right away.

A dev might think you can just catch and log the exception. Doesn't fix it.

You could run the code in prod for a while, see if it goes wrong. It will, at which point the dev will try it again, and it will probably work the second time, so they can stop thinking about it.

There was a big outbox pattern discussion a couple of days ago (split thing 1 into two halves, and do them atomically, leave thing 2 as an exercise for the reader.)

I think the reason you encounter this problem in the real world is that devs just exist in some quantum superposition of "it won't happen" and "I fixed it" and "it can't be fixed".


Replies

jmyeettoday at 1:21 PM

> A dev might think you can just catch and log the exception. Doesn't fix it.

You've just succinctly made the argument against checked exceptions FWIW (which I agree with you on). Anyone who has used Java in anger (is there any other way?) will be familiar with:

    try {
      doSoemthing();
    } catch (CheckedException e) {
      logger.error("Didn't work", e);
    }
Fault tolerance is general is terrible in most software. One of my biggest bugbears is network latency and transient failures in network requests that would be solved with a simple retry. But no, there's an incredibly lazy "Request failed" dialog to the user. That's the equivalent of the "log and silently swallow" pattern above. It can get a lot worse than that too. I have an app on my phone that will log me out and force me into a 2FA cycle if it hits a network timeout. Like.... WHYW?!?!?! Anyway, I digress...

This is largely a sotware issue. Control systems are built to handle these kinds of things. A traffic light can't accidentally show green in two directions. It's literally wired for that to be impossible because it's simply too important for it to not be possible. You constantly have to deal with faulty sensors so you have systems that will seek a consensus from 3+ sensors and, if that fails, it'll fail until you fix it.

But in software the standards just seem to be much lower even though it can be critical, even lethal eg [1]. Network interfaces should be fuzzed. Every IO operation should assume it can fail and be tested for when it does. Every IO operation should produce unexpected output. And it's simply cost-cutting and a lack of regulation that allows this sloppiness to persist. There should certainly be strict liability for any companies that allow this to happen.

[1]: https://ethicsunwrapped.utexas.edu/case-study/therac-25

show 3 replies
inigyoutoday at 12:36 PM

Why can the two things even fail at all?

show 2 replies