So if I understand this correctly, there are three main approaches:
1. SKIP LOCKED family
2. Partition-based + DROP old partitions (no VACUUM required)
3. TRUNCATE family (PgQue’s approach)
And the benefit of PgQue is the failure mode, when a worker gets stuck:
- Table grows indefinitely, instead of
- VACUUM-starved death spiral
And a table growing is easier to reason about operationally?
Taxonomy is correct. But the benefit isn't "table grows indefinitely vs. vacuum-starved death spiral"
in all three approaches, if the consumer falls behind, events accumulate
The real distinction is cost per event under MVCC pressure. Under held xmin (idle-in-transaction, long-running writer, lagging logical slot, physical standby with hot_standby_feedback=on):
1. SKIP LOCKED systems: every DELETE or UPDATE creates a dead tuple that autovacuum can't reclaim (xmin is frozen). Indexes bloat. Each subsequent FOR UPDATE SKIP LOCKED scans don't help.
2. Partition + DROP (some SKIP LOCKED systems already support it, e.g. PGMQ): old partitions drop cleanly, but the active partition is still DELETE-based and accumulates dead tuples — same pathology within the active window, just bounded by retention. Another thing is that DROPping and attaching/detaching partitions is more painful than working with a few existing ones and using TRUNCATE.
3. PgQue / PgQ: active event table is INSERT-only. Each consumer remembers its own pointer (ID of last event processed) independently. CPU stays flat under xmin pressure.
I posted a few more benchmark charts on my LinkedIn and Twitter, and plan to post an article explaining all this with examples. Among them was a demo where 30-min-held-xmin bench at 2000 ev/s: PgQue sustains full producer rate at ~14% CPU; SKIP LOCKED queues pinned at 55-87% CPU with throughput dropping 20-80% and what's even worse, after xmin horizon gets unblocked, not all of them recovered / caught up consuming withing next 30 min.