logoalt Hacker News

rkagereryesterday at 11:22 AM5 repliesview on HN

That's great, but I abhor UUID's.

I see them crop up everywhere. IMO, they are decidedly human-unfriendly - particularly to programmers and database admins trying to debug issues. Too many digits to deal with, and they suck up too much column width in query results, spreadsheets, reports, etc.

I'm not saying they don't have a place (e.g. when you have a genuine need to generate unique identifiers across completely disconnected locations, and the id's will generally never need to be dealt with by a human). But in practice they've been abused to do everything under the sun (filenames, URL links, user id's, transaction numbers, database primary keys, etc). I almost want to start a website with a gallery of all the examples where they've been unsuitably shoehorned in when just a little more consideration would have produced something more humane.

For most common purposes, a conventional, centralized dispenser is better. Akin to the Take-A-Number reels you see at the deli. Deterministic randomization is a thing if you don't want the numbers to count sequentially. Prefixes, or sharding the ID space, is also a thing, if you need uniqueness across different latency boundaries (like disparate datacenters or siloed servers).

I've lost count of how many times I've seen a UUID generated when what the designer really should have done is just grab the primary key (or when that's awkward, the result of a GetNextId stored procedure) from their database.


Replies

3eb7988a1663yesterday at 4:05 PM

At a prior job, there was an internal project code system for tracking billable hours or people assignment kind of thing. Everyone knew the codes of their projects. It was a six digit code, two letters and then four numbers: giving you some ~7 million point space. Company was ~100 years old and only had some 15k codes recorded in all history. The list of codes was manually updated once a quarter by an admin who might add another ten at a time.

Some chuckle head decided to replace the system with UUIDs. Now, they are no longer human memorable/readable/writable on paper/anything useful. Even better, they must have used some home grown implementation, because the codes were weirdly sequential. If you ever had to look at a dump of codes, the ids are almost identical minus a digit somewhere in the middle.

Destructive change that ruined a perfectly functional system.

staticassertionyesterday at 8:57 PM

People should really just use integers.

It's funny how fast it is to just implement a counter and how much people rely on UUIDs to avoid it. If you already use postgres somewhere, just create a "counter" table for your namespace. You can easily count 10K-100k values per second or faster, with room to grow if you outscale that.

What do you get? The most efficient, compressible little integers you could ever want. You unlock data structures like roaring bitmaps/ treemaps. You cut memory to 25% depending on your cardinality (ie: you can use u16 or u32 in memory sometimes). You get insane compression benefits where you can get rows of these integers to take a few bits of data each after compression. You get faster hashmap lookups. It's just insane how this compounds into crazy downstream wins.

It is absolutely insane how little cost it is to do this and how many optimizations you unlock. But people somehow think that id generation will be their bottleneck, or maybe it's just easier to avoid a DB sometimes, or whatever, and so we see UUIDs everywhere. Although, agreed that most of the time you can just generate the unique id for data yourself.

In fairness, UUID is easier, but damn it wrecks performance.

teerayyesterday at 1:09 PM

I just wish there was some human element to them so they were easier to talk about. Something like:

BASKETBALL-9a176cbe-7655-4850-9e7f-b98c4b3b4704-FISH

CAKE-3a01d58f-59d3-4b0c-87dc-4152c816f442-POTATO

“Which row was it, ‘basketball fish’ or ‘cake potato’?

Of course, the words would need to be a checksum. As soon as you introduce them, nobody is looking at the hex again. Which is an improvement, since nobody is looking at all the hex now “it’s the one ending in ‘4ab’”.

show 2 replies
forestoyesterday at 6:03 PM

> Deterministic randomization is a thing if you don't want the numbers to count sequentially.

What are your favorite ways to approach this?

I think a maximal period linear feedback shift register might fit well.