logoalt Hacker News

5NF and Database Design

165 pointsby petalmindyesterday at 4:22 PM60 commentsview on HN

Comments

sgarlandyesterday at 11:23 PM

NOTE: this is critiquing the author's 4NF definition (from a link in TFA), not TFA itself.

> If you read any text that defines 4NF, the first new term you hear is “multivalued dependency”. [Kent 1983] also uses “multivalued facts”. I may be dumb but I only very recently realized that it means just “a list of unique values”. Here it would be even better to say that it’s a list of unique IDs.

This is an inaccurate characterization, and the rest of the post only makes sense when viewed through this strawman. The reason 4NF is explained in the "weird, roundabout way" is because it demonstrates [one of] the precise problem[s] the normal form sets out to solve: a combinatorial explosion of rows.

If you have a table:

    CREATE TABLE Product(
      product_id INT NOT NULL,
      supplier_id INT NOT NULL,
      warehouse_id INT NOT NULL
    );
If you only ever add an additional supplier or an additional warehouse for a given product, it's only adding one row. But if you add both to the same product, you now have 4 rows for a single product; if you add 5 suppliers and 3 warehouses to the same product, you now have 15 rows for a single product, etc. This fact might be lost on someone if they're creating a table with future expansion in mind without thinking it through, because they'd never hit the cross-product, so the design would seem reasonable.

The conclusion reached (modulo treating an array as an atomic value) is in fact in 4NF, but it doesn't make any sense why it's needed if you redefine multivalued dependency to mean a set.

jerfyesterday at 5:59 PM

In a roundabout way this article captures well why I don't really like thinking in terms of "normal forms", especially as a numbered list like that. The key insights are really 1. Avoid redundancy and 2. This may involve synthesizing relationships that don't immediately obviously exist from a human perspective. Both of those can be expanded on at quite some length, but I never found much value in the supposedly-blessed intermediate points represented by the nominally numbered "forms". I don't find them useful either for thinking about the problem or for communicating about it.

Someone, somewhere writing down a list and that list being blessed with the imprimatur of Academic Approval (TM) doesn't mean it is actually useful... sometimes it just means that it made it easy to write multiple choice test questions. (e.g., "What does Layer 2 of the OSI network model represent? A: ... B: ... C: ... D: ..." to which the most appropriate real-world answer is "Who cares?")

show 3 replies
cremeryesterday at 8:29 PM

The numbered forms are most useful as a teaching device, not an engineering specification. Once you have internalized 2NF and 3NF violations through a few painful bugs, you start spotting partial and transitive dependencies by feel rather than by running through definitions. The forms gave you the vocabulary. The bugs gave you the instinct..

minkeymaniacyesterday at 7:36 PM

Normalize till it hurts, then denormalize till it works!

show 1 reply
ibrahimhossaintoday at 6:35 AM

The article makes a good point about when 5nf becomes impractical. In my experience, stopping at BCNF or 4nf often strikes a better balance unless you have very clear join dependencies. How do others decide where to stop normalizing in real world apps?

estetlinusyesterday at 5:19 PM

The lost art of normalizing databases. ”Why is the ARR so high on client X? Oh, we’re counting it 11 times lol”.

I would maybe throw in date as an key too. Bad idea?

show 2 replies
Quarrelsomeyesterday at 7:39 PM

Especially loved the article linked that was dissing down formal definitions of 4NF.

revaltoday at 1:44 AM

I haven’t finished reading this but I am commenting because of the form. Lead with the conclusions, table of contents, and then sources? This is someone who is confident in what they write. I wish more writing trusted the audience to decide if the writing were important instead of stringing the audience allow. Keep up the good work.

iFireyesterday at 6:55 PM

https://en.wikipedia.org/wiki/Essential_tuple_normal_form is cool!

Since I had bad memory, I asked the ai to make me a mnemonic:

* Every

* Table

* Needs

* Full-keys (in its joins)

show 1 reply
tadfisheryesterday at 4:44 PM

I love reading about the normal forms, because it makes me sound like I know what I'm talking about in the conversation where the backend folks tell me, "if we normalized that data then the database would go down". This is usually followed by arguments over UUID versions for some reason.

show 1 reply
akdev1lyesterday at 8:12 PM

My brain has been blunted too far due to dynamodb and NoSQL storage usage and now I can’t even normalize anymore

artyomtoday at 1:43 AM

Color me impressed. Even being very well versed in database design myself, this is just pragmatic and straight to the point, the way I'd have liked it back in the day.

I think the main problem of how 4NF and 5NF formal definitions were taught is that essentially common sense (which is mostly "sufficient" to understand 1NF-3NF) starts to slip away, and you start needing the mathematical background that Ed Codd (and others) had. And trying to avoid that is how those weird examples came up.

carlyaiyesterday at 6:15 PM

love this

DeathArrowyesterday at 5:08 PM

There are use cases where is better to not normalize the data.

show 4 replies