logoalt Hacker News

DuckDB Internals: Why Is DuckDB Fast? (Part 1)

362 pointsby marklitlast Tuesday at 11:07 AM117 commentsview on HN

Comments

smithclaytoday at 5:21 AM

If you're reading this and curious: consider writing a duckdb community extension* or contributing to an existing one*

duckdb is becoming a kind of data superglue between a lot of data ecosystems (GIS, observability, analytics, lakehouses, object storage, etc) that don't talk to each other typically, and it's worth checking out in 2026.

* https://github.com/duckdb/extension-template * https://duckdb.org/community_extensions/

show 3 replies
axegon_today at 1:09 PM

I use duckdb HEAVILY at work and it's been a game changer. I'm sifting through terabytes of data multiple times a day, mixing, matching, updating, filtering, DuckDB is second to none. For anyone that hasn't used it: you are missing out.

0xferrucciotoday at 5:35 AM

DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

Recently at work I've been using it to analyse the Claude code sessions of every engineer at our company (that we upload to S3) and it's been extremely helpful to help us find gaps in devex and have clear metrics to back up the impact of fixing them

Another thing it's been really useful for has been getting metrics on Claude skills usage and then dive into use-cases by looking at the transcripts

Other engineers that had never touched DuckDB were so impressed with how easy it is for AI agents to write queries on our dataset

show 5 replies
anitiltoday at 4:48 AM

DuckDb makes so much of my life easier, though I've never used it for large problems. The ability to run `select * from 'data.json'` is just lovely. The fact that it's also a powerhouse is so impressive, I'd usually expect a project to be good at small problems (like mine) xor large problems, but not both

show 1 reply
steve_adams_86today at 4:37 AM

> DuckDB has received widespread adoption because it's just so damn easy to use.

This was a major factor in my initial adoption. Since then it has stuck because it’s also absurdly capable, versatile, and fast.

If it wasn’t so easy to use I suspect I wouldn’t have adopted it when I did. The ergonomics are crazy. It still impresses me regularly.

show 1 reply
romanivtoday at 1:58 PM

It's an interesting project, but the discussion on HN looks weird. It gets brought up every few weeks[1] and everyone just spams comments with messages about how "fast" it is.

DuckDB is fast for some specific workloads. If you use it for most other things, it is at least an order of magnitude slower than SQLite. It also has some limitations in terms of what SQL it will currently run (e.g. I immediately ran into an issue with recursive queries). That will probably get better with time.

[1] If you search HN for "sqlite" and "duckdb" you get 4,310 hits and 2,398 hits respectively. That's a very heavy skew, considering SQLite is everywhere and had been around for a quarter century, while DuckDB effectively appeared on the scene two years ago.

willtemperleytoday at 8:02 AM

The one huge caveat for anyone that cannot use dynamic linking e.g. in an AppStore context, DuckDB isn’t a great choice. It’s very hard to statically link extensions.

This is where Arrow wins I think. Arrow CPP for example has very portable builds and the C interface is very usable for building bindings.

DuckDB is excellent, but it’s more a black box than a library.

Edit: after a conversation with a robot, it would seem that the DuckDB and ArrowCPP C APIs are complimentary, so it's very possible to have Arrow CPP and DuckDB to coexist in an app, each with its own strength. Arrow CPP doen't have a simple SQL story for example.

show 1 reply
jdw64today at 5:14 AM

The data scientists I work with use this. Why do they use it? I don't really know much about it, but I've noticed they use it quite often. I mainly use MySQL or PostgreSQL. What are the advantages of DuckDB? It seems like they usually use it as an alternative to Pandas.

show 3 replies
pedromlsreistoday at 2:55 PM

DuckDB is a great example of how far you can get by removing unnecessary layers... Columnar layout and vectorized execution is a powerful combination for OLAP workloads.

ilia-atoday at 1:20 PM

DuckDB is really neat, recently added PDO interface for it for PHP https://github.com/iliaal/pdo_duckdb

Still a bit raw, but getting there

snissntoday at 5:56 AM

I'm just curious - is duckdb too slow for people? This benchmark from clickhouse shows it being fairly slow compared to some options: https://jsonbench.com/

show 1 reply
mcvtoday at 9:08 AM

Is everything becoming columnar? Parquet stores data per column instead of per row because it improves compression. I get that. Arrow apparently is columnar, and now DuckDB also gets its efficiency by treating data as columns instead of rows?

I still need to wrap my head around how that works, but it's a fascinating development.

show 2 replies
tdhz77today at 3:57 PM

Is duckdb multi region active active?

bunsenhoneydewtoday at 7:34 AM

DuckDB is a fantastic piece of tech. One of the best, if not the best, I’ve found in several years.

Panzerschrektoday at 5:49 AM

If DuckDB is so fast and has no data transfer overheads, does it need all this typical SQL machinery with filtering and joining via SELECT queries? Wouldn't it be simpler and faster to return all data to the caller code (all table rows, but only requested columns) and let it perform all other necessary data processing logic?

show 2 replies
pknerdtoday at 5:48 AM

FTA:

> ..In-process means there's no server. You don't connect to DuckDB; you load it as a library inside your program, the same way you'd load NumPy or Polars

Does it mean it can perform all statistical computations as well if I want to use for algo trading?

sigbottletoday at 1:37 PM

What the fuck.

I've never been that strong of an engineer. TIL that at one of my internships I was building DuckDB but for the company's private use cases. Well, trying to anyways. I didn't really get the whole picture, the pieces did not fit into place.

Didn't get the return offer obviously, probably because I didn't make the connection (or really a coherent narrative of what I was building). RIP. You live and you learn, I guess.

thefourthchimetoday at 4:42 AM

I’m a huge fan, I’ve been wanting to know into the internals. Look forward to digging in.

codingbeartoday at 5:24 AM

duckdb is so nice coupled with claude code. It extensive file support and some very interesting decisions on local caching data (like from S3 or snowflake) makes it easy to slice and dice almost any kind of tabular data.

show 1 reply
holografixtoday at 5:48 AM

Why is DuckDB so popular when one can use Python + Pandas?

Better perf + SQL is that mostly it?

show 4 replies
f311atoday at 6:04 AM

I wish this article was not LLM written

bunbun69today at 3:41 PM

Holy LLM slop article…

pknerdtoday at 5:52 AM

umm can we say it can replace SQLite?

show 3 replies
pierregillesltoday at 12:18 PM

[dead]

kunal183today at 9:09 AM

[flagged]

Omnilooptoday at 9:27 AM

[flagged]

charanmilantoday at 8:56 AM

[flagged]

gordonwu8383today at 3:43 PM

[flagged]

gordonwu8383today at 3:50 PM

[dead]

gordonwu8383today at 12:19 PM

[dead]

tobyhinloopentoday at 8:23 AM

The only reason I know and use DuckDB is because my (internal, private-use-only, experimental) vibe coded projects use it a ton. I didn't pick it - LLMs did. Until this article, I wasn't aware of what it actually is capable of.

Most of these projects use JSON(L) files for storage, and duckdb to process them.

show 2 replies