logoalt Hacker News

noelwelshyesterday at 4:57 PM1 replyview on HN

In 2 out of 3 problematic bugs I've had in the last two years or so were in statically typed languages where previous developers didn't use the type system effectively.

One bug was in a system that had an Email type but didn't actually enforce the invariants of emails. The one that caused the problem was it didn't enforce case insensitive comparisons. Trivial to fix, but it was encased in layers of stuff that made tracking it down difficult.

The other was a home grown ORM that used the same optional / maybe type to represent both "leave this column as the default" and "set this column to null". It should be obvious how this could go wrong. Easy to fix but it fucked up some production data.

Both of these are failures to apply "parse, don't validate". The form didn't enforce the invariants it had supposedly parsed the data into. The latter didn't differentiate two different parsing.


Replies

rzwitserlootyesterday at 5:09 PM

that's a bit of a hairy situation. You're doing it wrong. Or not really, but.. complicated.

As per [RFC 5321](https://www.rfc-editor.org/rfc/rfc5321.html):

> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

You're not allowed to do that. The email address `[email protected]` is identical to `[email protected]`, but not necessarily identical to `[email protected]`. If we're going to talk about 'commonly applied normalisations at most email providers', where do you draw that line? Should `[email protected]` be considered equal to `[email protected]`? That souds weird, except - that is exactly how gmail works, a couple of other mail providers have taken up that particular torch, and if your aim is to uniquely identify a 'recipient', you can hardcode that `[email protected]` and `[email protected]` definitely, guaranteed, end up at the same mailbox.

In practice, yes, users _expect_ that email addresses are case insensitive. Not just users, even - various intermediate systems apply the same incorrect logic.

This gets to an intriguing aspect of hardcoding types: You lose the flex, mostly. types are still better - the alternative is that you reliably attempt to write the same logic (or at least a call to some logic) to disentangle this mess every time you do anything with a string you happen to know is an email address which is terrible but gives you the option of intentionally not doing that if you don't want to apply the usual logic.

That's no way to program, and thus actual types and the general trend that comes with it (namely: We do this right, we write that once, and there is no flexibility left). Programming is too hard to leave room for exotic cases that programmers aren't going to think about when dealing with this concept. And if you do need to deal with it, it can still be encoded in the type, but that then makes visible things that in untyped systems are invisible (if my email type only has a '.compare(boolean caseSensitive)' style method, and is not itself inherently comparable because of the case sensitivity thing, that makes it _seem_ much more complicated than plain old strings. This is a lie - emails in strings *IS* complicated. They just are. You can't make that go away. But you can hide it, and shoving all data in overly generic data types (numbers and strings) tends to do that.

show 1 reply