logoalt Hacker News

jackfischertoday at 6:37 PM2 repliesview on HN

Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?


Replies

maherbegtoday at 9:11 PM

The way we solved it is by checking the lsn on the primary, and then waiting for the replica to catch up to that lsn before doing reads on the replica in various scenarios.

levkktoday at 6:41 PM

Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.

We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.

show 1 reply