> note that the local NVMe setup does have backups and WAL-archival to S3, which provides data-durability with RPO of 10s of seconds.
That's good, it provides a reasonable level of RPO, although I prefer just 0. But with this the RTO (or, equivalently, the downtime) is potentially quite large.
> Even with HA setup I expect performance difference to be similar across systems (may be slightly lesser)
I have done some benchmarks and I have observed a 24-27% performance penalty for semi-sync replication on instances using local NVMe. That's why I say this is non zero.
> In regards to default configs, that was intentional as default tuning is a differentiation across services and that needs to be measured. However we plan to add more configurability on postgres tuning in the future.
That'd be great.
Ack, your insight is very interesting.
Back at Citus/Microsoft, we typically saw around a 30% performance drop with synchronous replication on EBS-backed Postgres. I’d expect something in that ballpark for RDS and Crunchy as well. For NVMe-backed Postgres, we haven’t yet measured the impact of quorum-based replication, and it’s certainly possible the overhead ends up being higher than 30%.
That said, the single-node margins are already quite substantial, over 2× in all cases and up to 5× versus RDS in our benchmarks. Even with a meaningful HA penalty, NVMe-backed setups could still remain very compelling from a performance perspective. We’ve just started running HA benchmarks, so stay tuned.
Side note local NVMe backed Postgres is for perf is not new - many enterprise companies like Datadog and Instacart run their performance critical services on them, though self-managed.
In regard to RTO for single-node setups, it wouldn’t be great (at least minutes) in most systems, since recovery still needs to happen from backups.
Overall, very useful feedback. Thanks again for chiming in!