This feels like the sort of architecture that starts clean and then gradually grows most of the thin...

joshka • yesterday at 7:08 PM • 6 replies • view on HN

This feels like the sort of architecture that starts clean and then gradually grows most of the things a workflow-native system already has. I've seen systems like this, seen companies that are built out of this idea, and built small systems like this over time.

Once you need retries, backoff, timeouts, cancellation, versioning, visibility, task routing, rate limits, leases, heartbeats, stuck-worker detection, replay/debugging semantics, workflow migration, fanout/fanin, long timers, audit trails, and operator tooling, the “just use a database” story becomes “build a poor copy of a workflow engine plus a bunch of workers.” pretty quick.

That may still be a good tradeoff for many applications, especially if Postgres is already the core operational dependency. But the comparison shouldn’t be “database vs overcomplicated orchestrator.” It’s more like “what complexity do you want to own, and what do you want to buy / offload to a professional system?”

Replies

hmaxdml • yesterday at 11:03 PM

Yeah, we've observed that too: people start implementing their own retry logic, idempotency, etc. But then they grow a hard to maintain, complex stack that's not their core business logic. There's a reason why there is a dedicated team building DBOS, every day. Because it's not that easy to build a solid durable workflows engine on Postgres.

UltraSane • today at 10:38 AM

Comments like this by people who know exactly what they are talking about are why I love Hackernews

cpursley • today at 6:20 AM

https://github.com/pgmq/pgmq

epolanski • yesterday at 11:15 PM

Bingo, not even mentioning the blog post assumes all steps to be serializable.

I feel like this is the usual "just use postgres" garbage post that lacks any kind of nuance.

In fact you could replace that post with any other db and the statements keep being true, and naive.

tomcam • today at 6:36 AM

Ridiculously good analysis! HN is a national treasure because of posts like this.

➕ show 1 reply

nulltrace • yesterday at 11:46 PM

The SKIP LOCKED pattern is fine until the worker count climbs. Then vacuum can't keep up. Dead tuples pile up, visibility map turns to swiss cheese. Queue table is tiny on disk but the planner thinks it's huge and stops using the index. It gets ugly fast.

alt Hacker News

Replies