logoalt Hacker News

KraftyOnetoday at 3:08 AM1 replyview on HN

> Based on the shown graph, this is misleading at best, essentially false. After 120K writes/s p50 spikes from 10ms to 1s (1 second for a write!!!!). That's two orders of magnitude latency spike, and an unacceptable one for an OLTP workload. It clearly shows the server is completely saturated, which is clearly a non operational regime. Quoting 144K is equivalent to quoting the throughput of a highway at the moment traffic comes to a standstill.

> Based on this graph the highest number I'd quote is 120K. And probably you want to keep operating the server within a safe margin below peak, but since this is a benchmark, let's call 120K the peak. Because actually p50 is not even the clear-cut. It should be a higher percentile (say p95) at which latency is within reasonable bounds. But for the shake of not over complicating, it could be taken as a reference.

You definitely don't want to run a production system at saturation! But it's worthwhile to measure a complex system like Postgres at saturation, see when it gets there and how it behaves there, and then run at a slightly lower throughput.

> Therefore, you are not measuring Postgres peak performance, but rather Postgres performance under the IO constraints of this particular system. Certainly, 120K IOPS is the maximum that this particular instance can have. But it doesn't show if Postgres could do better under a more performant IO disk. Actually, a good test would have been to try the next instance (db.m7i.48xlarge) with 240K IOPS and see if performance doubles (within the same envelop of p50 latency) or not. And afterwards to test on an instance with local NVMe (you won't find this in RDS).

I've done some testing (not in the blog post)--doubling instance size/IOPS doesn't improve performance significantly because it doesn't affect the WAL bottleneck. Local NVMe should have a significant impact in theory, but I haven't tested this myself.

> 300 seconds test duration?? This is not operational. You are not accounting for checkpoints, background writer, and especially autovacuum. Given that workflow pattern includes UPDATEs, you must validate bloat generation (or, equivalently, bloat removal) by a) observing much longer periods of time (e.g. 1h) and b) making sure the autovacuum configuration (and/or individual table vacuum configuration if required) makes bloat contained in a stable way. Otherwise, shown performance numbers will degrade over time, making them not realistic.

Those are usage examples (notice the 1000 rps)--actual benchmarks were run at and were stable at much longer duration.


Replies

ahachetetoday at 3:49 AM

> You definitely don't want to run a production system at saturation! But it's worthwhile to measure a complex system like Postgres at saturation, see when it gets there and how it behaves there, and then run at a slightly lower throughput.

I disagree. It's worthless a number at saturation. Because "a slightly lower throughput" is at best an unqualified hand-waving. Real numbers can be quite far from that saturation point.

Quote instead real production numbers. You can define them clearly, it's not that hard. E.g.: p95 below 10ms latency. That's it. Measure and report that number.

> I've done some testing (not in the blog post)--doubling instance size/IOPS doesn't improve performance significantly because it doesn't affect the WAL bottleneck. Local NVMe should have a significant impact in theory, but I haven't tested this myself.

But those would be interesting numbers to share! "Doesn't improve performance significantly" --sorry, I'm not big friend of unqualified data points. Is it 10%, 20%, 50%? And definitely, when measured at saturation, surely you don't see improvements. But if measured at an operational regime, you should probably see notable improvements (unless other scaling factors start to dominate, in which case your benchmark becomes much more meaningful because then you are finding Postgres scaling limits and not just the limits of the disk on which it's running). Changes the picture dramatically.

> Those are usage examples (notice the 1000 rps)--actual benchmarks were run at and were stable at much longer duration.

Sorry, but if you use that as an example, gives me little confidence about the real intent. But glad to hear you run at longer duration --add that information to the post! But again, that's not enough. Show the bloat and demonstrate how stable it is, given the tuning required to keep it contained, of course. Also show the tps over time --I'm sure it drops notably in the presence of checkpoints-- and then the "under 10ms latency at p95" will become dominated by write performance during checkpoints.

Because when you determine your SLOs, it's not at the happy path, but the opposite. And saying "Postgres can do 144K writes/sec on this machine" is beyond the happy path, so it's not meaningful for me.