logoalt Hacker News

srameshclast Friday at 6:07 PM3 repliesview on HN

I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.

Loading data into DuckDB is super easy, I was surprised :

SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';

and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'


Replies

loa_observerlast Friday at 7:03 PM

duckdb is super fast for analytic tasks, especially when u use it with visual eda tool like pygwalker. it allows u handles millions of data visuals and eda in seconds.

but i would say, comparing duckdb and sqlite is a little bit unfair, i would still use sqlite to build system in most of cases, but duckdb only for analytic. you can hardly make a smooth deployment if you apps contains duckdb on a lot of platform

show 1 reply
mikepurvislast Friday at 6:21 PM

Whoa. Is that first query building an index of random filesystem json files on the fly?

show 1 reply
lame_lexemlast Friday at 6:40 PM

can we all agree to never store datasets uncompressed. duckdb supports reading many compression formats

show 1 reply