logoalt Hacker News

jdw64today at 5:14 AM3 repliesview on HN

The data scientists I work with use this. Why do they use it? I don't really know much about it, but I've noticed they use it quite often. I mainly use MySQL or PostgreSQL. What are the advantages of DuckDB? It seems like they usually use it as an alternative to Pandas.


Replies

medvezhenoktoday at 5:32 AM

DuckDB has been probably my most used tool in 2026 - if you're comfortable with SQL it's incredible at quickly prototyping and slicing / dicing data.

I do a lot of experiments with regexes, and if you get used to the RE2 syntax that DuckDB uses, you can see up to 10-100x uplift in terms of speed compared to Postgres on things like regexp_matches(), regexp_extract(), etc (depending on query/table/machine specifics). It has quite powerful scripting with custom Macros, fixes a lot of annoyances of SQL for me compared to Postgres.

I think if you have access to a machine with a lot of RAM / cores and a beefy data set, then it's basically like a RAMdisk version of Snowflake running locally on your machine.

(and of course the fact that it makes it convenient to read CSV/parquet, read/write from S3, etc) - it's a very ergonomic tool.

show 1 reply
Demiurgetoday at 5:31 AM

Here is the thing, it’s a write only single file format. If you need to run analytical queries it’s optimized for reading, you just open a file and query for the parts you want. If you have multiple clients that read and write data to the database, you should use postgresql.

It’s not really a database in the traditional sense, there is no ACID complexity, it’s a library that lets use write SQL to query a tabular data file.

bdcravenstoday at 5:20 AM

Primarily the ability to work directly with data in its native format (CSV for example) without needing ETL.

show 2 replies