logoalt Hacker News

The perils of UUID primary keys in SQLite

132 pointsby emschwartzyesterday at 11:13 PM75 commentsview on HN

Comments

blopkertoday at 12:43 AM

UUIDs are way over used. There is almost always a better key to use, usually a bigint for databases. If you're making some kind of leaderless distributed data store, then maybe, but even then there are other ID sharding strategies I'd go for first depending on the constraints.

For a single database, bigints are smaller and faster, with less footguns.

UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.

show 9 replies
adityaathalyetoday at 11:29 AM

Thanks for the benching, Anders! So grateful for the stuff you've shared over the years. Invariably, every single post has been useful and/or educational to me.

I read this post more as an illustration of the *value* of UUIDv7 as primary key, over integer primary keys, in lieu of minimal loss of read/write performance, and marginally more data on disk bloat.

SQLite's automatic integer rowID primary key is a no-brainer, when the SQLite application is local-only, such as application storage format (mobile and desktop). Or is never intended to grow beyond a single server instance. Basically, where each SQLite file is private to a singular instance of the application.

However, if there is even an outside chance of needing to cooperate across application instances, e.g. the minimal limit case of a personal knowledge base that should seamlessly sync across a person's devices, as well as a hosted service, then a high-quality sequential random ID starts to make a lot more sense. (No-brainer arbitrary table merges / splits / remerges, de-duplication, etc.)

Random ID primary key is a bad idea period, whether it be the UU kind or the SQ kind, or any other kind. As far as my DB knowledge goes, this class of ID destroys all tree-algorithms, and we are stuck with the fact that there is no practically better way, than an appropriate tree-structure, to group and organise a meaningful amount of data, efficiently and effectively.

show 2 replies
jdthediscipletoday at 8:27 AM

So UUID isn't the problem but UUID v4 is, just like any random ID-scheme, correct?

UUID v7 so far seems like the best solution if you want UUID benefits and ordering.

show 1 reply
ItsBobtoday at 8:37 AM

My rule for primary keys and id's is simple: Sequential integer (or bigint) as the PK and if I need to make it public, I have a GUID (or UUID) in the row too, e.g. tbl_person would have Id (int|bigint) and person_guid as (UUID).

The Integer id is used for joins and looks ups and such but that's it. If I need to send anything to the frontend or outside of the app/DB then that's the UUID.

andersmurphytoday at 6:35 AM

This is actually a draft. I Wanted to add more details about how this changes with row size etc. I might get time to update it later today.

show 1 reply
bambaxtoday at 7:49 AM

Why would you use UUIDs a primary keys? Let SQLite use rowids internally (which is automatic and invisible), and have a different (indexed) column with UUID if you need that for publishing the ID somewhere.

show 2 replies
pyuser583today at 2:48 AM

Oh gosh the ints v uuids debate for pks. This is worse than vim v eMacs or brackets v braces.

ac50hztoday at 7:42 AM

I enjoy these carefully worded posts from Anders Murphy, illustrative and informative, not opinionated and preachy. Very useful, it’s great to see the process, and ofc bookmarkeable material for sharing with others.

w10-1today at 1:44 AM

Isn't the solution just to use the rowid (after doing the read-id-after-insert dance)?

How much trouble does SQLite reysing rowid's actually cause?

show 1 reply
kjgkjhfkjftoday at 10:53 AM

The script to create the benchmark numbers appears to be inserting 100 batches, not 10. (The benchmark numbers in the table appear to be consistent with the text, so I guess the actual script used to create them was correct.)

show 1 reply
yepyouknoyesterday at 11:29 PM

Perils of “UUIDv4”. Everyone knows that’s what UUIDv7 was really for, and you should always convert that to binary to optimize everything.

show 3 replies
cropcirclbureautoday at 7:15 AM

Is this relevant for other databases? For postgres for example, which supports concurrent writers, wouldn't sequential keys lead to contention on the page at the frontier?

show 1 reply
dumbledorfyesterday at 11:38 PM

Wait how is sqlite doing a million inserts a second?

show 3 replies
michaelcampbelltoday at 11:58 AM

How much time is `(random-uuid7-bytes)` taking?

knightops_devtoday at 5:44 PM

[flagged]

wood_spirittoday at 5:52 AM

If you need (or want the convenience of) a uuid and the time of creation is not secret then use ulids eg uuid v7.