logoalt Hacker News

oulipo2last Friday at 10:11 PM3 repliesview on HN

I'm using Postgres as my base business database, and thinking now about linking it to either DuckDb/DuckLake or Clickhouse...

what would you recommend and why?

I understand part of the interest of pg_clickhouse is to be able to use "pre-existing Postgres queries" on an analytical database without having to change anything, so if I am building my database now and have no legacy, would pg_clickhouse make sense, or should I do analytics differently?

Also, would you have some kind of tutorial / sample setup of a typical business application in Postgres and kind of replication in clickhouse to make analytics queries? so I can see how Clickhouse would be typically used?


Replies

jascha_englast Friday at 11:30 PM

Depending on your workload you might also be able to use Timescale to have very fast analytical queries inside postgres directly. That avoids having to replicate the data altogether.

Note that I work for the company that built timescale (Tiger Data). Clickhouse is cool though, just throwing another option into the ring.

Tbf in terms of speed Clickhouse pulls ahead on most benchmark, unless you want to join a lot with your postgres data directly then you might benefit from having everything in one place. And of course you avoid the sync overhead.

show 1 reply
mritchie712last Friday at 11:02 PM

We released a meltano target for DuckLake[0]. dlt has one now too. Pretty easy to sync pg -> ducklake.

I've been really happy with DuckLake, happy to answer any questions about it.

DuckDB has always felt easier to use vs. Clickhouse for me, but both are great options. If I were you, I'd try both options for a few hours with your use case and pick the one that feels better.

0 - https://www.definite.app/blog/target-ducklake

show 2 replies
saisrirampurlast Friday at 10:27 PM

Great question! If you’re starting a greenfield application, pg_clickhouse makes a lot of sense since you’ll be using a unified query layer for your application.

Now, coming to your question about replication: you can use PeerDB (acquired by ClickHouse https://github.com/PeerDB-io/peerdb), which is laser-focused and battle-tested at scale for Postgres-to-ClickHouse replication. Once the data is replicated into ClickHouse, you can start querying those tables from within Postgres using pg_clickhouse. In ClickHouse Cloud, we offer ClickPipes for Postgres CDC/replication, which is a managed service version of PeerDB and is tightly integrated with ClickHouse. Now there could be non-transcational tables that you can directly ingest to ClickHouse and still query using pg_clickhouse.

So TL;DR: Postgres for OLTP; ClickHouse for OLAP; PeerDB/ClickPipes for data replication; pg_clickhouse as the unified query layer. We are actively working on making this entire stack tightly integrated so that building real-time apps becomes seamless. More on that soon! :)

show 1 reply