It scales beyond the needs that most people have in most situations.
The constant problem is that "big scale" always means "larger than I've seen", so on any project larger than a person has encountered, they assume they need to pull out the big guns. Also, people worry about things like what happens if they really *do* scale 10 years from now.
Neither is a practical concern for nearly anyone who will ever face this decision.
And then yes, of course, some people have problems that actually can't be solved by Postgres. But verify this first, don't assume.
Postgres can scale, just like any database can scale.
The issue is there's a lot of lore and esoterica required to get it to scale.
Every time there's an issue with Postgres there's someone that's all "just do xyzzy, it's super obvious that this undocumented setting fixes it."
Just look at the slowdown/big pages problem.
If it works for you, use it.
I thought this was a fun article from a couple months back: https://openai.com/index/scaling-postgresql/
This post conflates scalability and performance. PostgreSQL is fast on smallish systems, but try adding more CPU cores and you'll see performance gains will not be linear at all. Modern server can ship with 256 or more cores and a single instance of PostgreSQL will struggle to take advantage of these.
4-8 cores is no problem at all, though.
They can adjust their checkpoint settings to increase throughput further - https://www.postgresql.org/docs/current/wal-configuration.ht...
> we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day.
Based on the shown graph, this is misleading at best, essentially false. After 120K writes/s p50 spikes from 10ms to 1s (1 second for a write!!!!). That's two orders of magnitude latency spike, and an unacceptable one for an OLTP workload. It clearly shows the server is completely saturated, which is clearly a non operational regime. Quoting 144K is equivalent to quoting the throughput of a highway at the moment traffic comes to a standstill.
Based on this graph the highest number I'd quote is 120K. And probably you want to keep operating the server within a safe margin below peak, but since this is a benchmark, let's call 120K the peak. Because actually p50 is not even the clear-cut. It should be a higher percentile (say p95) at which latency is within reasonable bounds. But for the shake of not over complicating, it could be taken as a reference.
> We found that the bottleneck was in flushing the Postgres write-ahead log (WAL) to disk
Therefore, you are not measuring Postgres peak performance, but rather Postgres performance under the IO constraints of this particular system. Certainly, 120K IOPS is the maximum that this particular instance can have. But it doesn't show if Postgres could do better under a more performant IO disk. Actually, a good test would have been to try the next instance (db.m7i.48xlarge) with 240K IOPS and see if performance doubles (within the same envelop of p50 latency) or not. And afterwards to test on an instance with local NVMe (you won't find this in RDS).
> From [1]: > Postgres insert throughput > uv run python benchmarks/postgres_insert.py --rps 1000 --duration 300
300 seconds test duration?? This is not operational. You are not accounting for checkpoints, background writer, and especially autovacuum. Given that workflow pattern includes UPDATEs, you must validate bloat generation (or, equivalently, bloat removal) by a) observing much longer periods of time (e.g. 1h) and b) making sure the autovacuum configuration (and/or individual table vacuum configuration if required) makes bloat contained in a stable way. Otherwise, shown performance numbers will degrade over time, making them not realistic.
Only after proper autovacuum tuning and under the effects of bgwriter (also tuned!) and checkpoints (all required, especially for write intensive workloads), over much larger periods of time, the benchmark could be considered meaningful.
> We next measure the scalability of Postgres-backed queues.
I'd recommend benchmarking the recently announced PgQue project, that operates bloat-free (one of the largest operational hurdles in queue-like Postgres workloads). See [2] for a previous discussion in HN.
[1]: https://github.com/dbos-inc/dbos-postgres-benchmark [2]: https://news.ycombinator.com/item?id=47817349
Write QPS isn't what you want to measure, you should look at index fragmentation, dead tuple usage etc etc. All of those generally go in directions that you don't want.
Yes, you can scale it quite well vertically.
But how about horizontally? It would be nice to have high availability, or even to be able to upgrade the OS and postgres itself without downtime.
DBOS is amazing when it comes to Durable Workflows. There are others in the space - the most popular one being Temporal but I argue, Temporal is also the most complicated one. I often say Temporal is like Kubernetes while DBOS is like `docker compose`. (and for those taking me literally, you can use DBOS in Kubernetes!)
I don't realize why DBOS is not nearly as popular as Temporal but it has made a world of difference building Durable Queues and Long Running, Durable Workflows in Python (it supports other languages too).
As they show in this article, Postgres scales impressively well (4 billion workflows per day, on a db.m7i.24xlarge, enough for most applications), which is why, if you have your PostgreSQL backup/restore strategy knocked out and dialed in, you should really take a close look at DBOS to handle your cloud agnostic or self hosted Durable Queues and Durable Workflows. It's an amazing piece of software founded by the original author of Ingres (precusor to Postgres - the story of DBOS itself is captivating. I believe it started from being unable to scale Spark job scheduling)
Of course, there are many plugins and similar things.
And Does Postrgres Backup scale?
[flagged]
when discussing DB it becomes so so interesting not because db itself but the people trying to ask some infeasible questions
"Overall, we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day."
Based on a problem I'm facing with Postgres today, I wonder if this really progresses as linearly as the article wants to make it out.
We're in the middle of evaluating Postgres as a replacement for MySQL, and experience notable slow-down for plain multi-row inserts due to index growth as soon as the table reaches just a couple of dozen million rows. It's an uncomplicated and flat (no constraints or foreign keys etc.) medium width table of about 10-15 columns and a handful of non-composite btree indices - and/or hash indices; we've tried mixing and matching just to see what happens - but ingestion drops to less than half already before 50m rows. At 100m rows the insertion performance is down to a fraction and from there it just gets worse the larger the table and its indices grow. It's as if there's some specific exponential cut-off point where everything goes awry. However, if we simply remove all indices from the table, Postgres will happily insert hundreds of millions rows at a steady and near identical pace from start to end. The exact same table and indices on MySQL, as closely as we can match between MySQL and Postgres, running on the same OS and hardware, maintains more or less linear insertion performance well beyond 500m rows.
Now, there's a lot to say about the whys and why-nots when it comes to keeping tables of this size in an RDBMS and application design relying on it to work out, and probably a fair amount more about tuning Postgres' config, but we're stumped as to why PG's indexing performance falters this early when contrasted against InnoDB/MySQL. 50-100m rows really isn't much. Would greatly appreciate if anyone with insight could shed some light on it and maybe offer a few ideas to test out.
(add.: during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O)