logoalt Hacker News

larsiusprimeyesterday at 9:28 PM8 repliesview on HN

“import geopandas” also exists and has for some time. Snark aside, WHAT is special about duckDB? I wish the author had actually shown some practical examples so I could understand their claims better.


Replies

maxxenyesterday at 10:31 PM

I replied to another comment, but I think a big part is that duckdbs spatial extension provides a SQL interface to a whole suite of standard foss gis packages by statically bundling everything (including inlining the default PROJ database of coordinate projection systems into the binary) and providing it for multiple platforms (including WASM). I.E there are no transitive dependencies except libc.

Yes, DuckDB does a whole lot more, vectorized larger-than-memory execution, columnar compressed storage and a ecosystem of other extensions that make it more than the sum of its parts. But while Ive been working hard on making the spatial extension more performant and more broadly useful (I designdd a new geometry engine this year, and spatial join optimization just got merged on the dev-branch), the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.

(Discmaimer, I work on duckdb-spatial @ duckdblabs)

show 5 replies
jjthebluntyesterday at 10:37 PM

duckdb has parquet support and can operate, in SQL syntax, on enormous 'tables' spread across huge collections of parquet files as if one virtual file. i believe the underlying implication is opportunities to leverage vector instructions on parquet. it's very "handy".

getnormalityyesterday at 10:32 PM

Everything is special about DuckDB. Pandas is way, way behind the state of the art in tabular data analysis.

show 1 reply
dbreunigyesterday at 10:18 PM

Author here: what's special is that you can go from 0 to spatial data incredibly quickly, in the data generalist tool you're already using. It makes the audience of people working with geospatial data much bigger.

(Geopandas is great, too.)

show 1 reply
joshvmyesterday at 10:17 PM

I haven't used duckDB but the real comparison is presumably postgis? Which is also absent from the discussion, but I think what the author alludes to.

I have no major qualm with pandas and geopandas. However I use it when it's the only practical solution, not because I enjoy using it as a library. It sounds like pandas (or similar) vs a database?

show 1 reply
tom_mtoday at 1:00 AM

Convenience will always be a personal preference.

tsssyesterday at 10:36 PM

For one it doesn't have the god awful pandas API.

show 1 reply
tmpz22yesterday at 10:27 PM

I've been researching DuckDB - while it has many technical merits I think the main argument will be ease of use. It has a lot of the operational advantages of sqlite paired with strong extensibility and good succinct documentation.

Folks who have been doing DevOps work are exasperated with crummy SaaS vendors or antiquated OSS options that have a high setup cost. DuckDB is just a mature project that offers an alternative, hence an easy fan favorite among hobbyists (I imagine at scale the opportunity costs change and it becomes less attractive).

show 1 reply