I haven't yet understood this pattern (and I tried using duckdb). Unless you're only ever ...

vasco • 05/04/2025 • 2 replies • view on HN

I haven't yet understood this pattern (and I tried using duckdb). Unless you're only ever going to query those files once or twice in your life, importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.

Also as a side note, is everyone just using DuckDB in memory? Because as soon as you want some multiple session stuff I'd assume you'd use DuckDB on top of a local database, so again I don't see the point but I'm sure I'm missing something.

Replies

wenc • 05/04/2025

> importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.

Usually new data is generated regularly and would require creating a separate ETL process to ingest into Postgres. With DuckDB, no ETL is needed. New Parquet files are just read off the disk.

> Also as a side note, is everyone just using DuckDB in memory?

DuckDB is generally used as a single-user, and yes in-memory use case is most common. Not sure about use cases where a single user requires multiple sessions? But DuckDB does have read concurrency, session isolation etc. I believe write serialization is supported in multiple sessions.

With Parquet files, it's append-only so the "write" use-cases tend to be more limited. Generally another process generates those Parquet files. DuckDB just works with them.

➕ show 1 reply

lugarlugarlugar • 05/04/2025

I think the main point is not having to store a duplicate of the 600GB of input data.

alt Hacker News

Replies