logoalt Hacker News

saisrirampurlast Friday at 10:27 PM1 replyview on HN

Great question! If you’re starting a greenfield application, pg_clickhouse makes a lot of sense since you’ll be using a unified query layer for your application.

Now, coming to your question about replication: you can use PeerDB (acquired by ClickHouse https://github.com/PeerDB-io/peerdb), which is laser-focused and battle-tested at scale for Postgres-to-ClickHouse replication. Once the data is replicated into ClickHouse, you can start querying those tables from within Postgres using pg_clickhouse. In ClickHouse Cloud, we offer ClickPipes for Postgres CDC/replication, which is a managed service version of PeerDB and is tightly integrated with ClickHouse. Now there could be non-transcational tables that you can directly ingest to ClickHouse and still query using pg_clickhouse.

So TL;DR: Postgres for OLTP; ClickHouse for OLAP; PeerDB/ClickPipes for data replication; pg_clickhouse as the unified query layer. We are actively working on making this entire stack tightly integrated so that building real-time apps becomes seamless. More on that soon! :)


Replies

oulipo2last Saturday at 9:27 AM

Nice! Right now I'm using Timescaledb, do you think it makes sense to move to a Postgres+CH setup instead? or only if I hit the limit of timescaledb?

Also what would be the benefit for me of querying clickhouse from Postgres, rather than directly through my backend via an ORM/SDK? is that because it would allow me to do JOINs?

What would be the typical setup if I want to JOIN analytical data (eg my IoT device readings) from CH with some business data (eg the user owning the device) from my Postgres? Would I replicate that business data to CH to do the join there, or would that be typically the exact use-case for pg_clickhouse?

show 1 reply