logoalt Hacker News

wenc05/03/20253 repliesview on HN

I'm a big fan of DuckDB and I do geospatial analysis, mostly around partitioning geographies (into Uber H3 hexagons), calculating Haversine distances, calculating areas of geometries, figuring out which geometry a point falls in, etc. Many of these features have existed in some form or other in geopandas or postgis, so DuckDB's spatial extensions bring nothing new.

But what DuckDB as an engine does is it lets me work directly on parquet/geoparquet files at scale (vectorized and parallelized) on my local desktop. It beats geopandas in that respect. It's a quality of life improvement to say the least.

DuckDB also has an extension architecture that admits more exotic geospatial features like Hilbert curves, Uber H3 support.

https://duckdb.org/docs/stable/extensions/spatial/functions....

https://duckdb.org/community_extensions/extensions/h3.html


Replies

sroerick05/04/2025

I totally agree with this. DuckDB for me was a huge QoL improvement just working with random datasets. I found it much easier to explore datasets using DuckDB rather than Pandas, Postgres or Databricks.

The spatial features were just barely out when I was last doing a lot of heavy geospatial work, but even then they were very nice.

An aside, I had a Junior who would just load datasets into PowerBI to explore them for the first time, and that was actually a shockingly useful workflow.

pandas is very nice and was my bread and butter for a long time, but I frequently ran into memory issues and problems at scale with pandas, which I would never hit with polars or duckdb. I'm not sure if this holds true today as I know there's been updates, but it was certainly a problem then. Using geopandas ran into the same issues.

Just using GDAL and other libraries out of the box is frankly not a great experience. If you have a QGIS (another wonderful tool) workflow, it's frustrating to be dropping into Jupyter notebooks to do translations, but that seemed to be the best option.

In general, it just feels like geospatial analysis is about 10 years behind regular data analysis. Shapefiles are common because of ESRI dominance, but frankly not a great format. PostGIS is great, geopandas is great, but there's a lot more things in the data ecosystem than just Postgres and pandas. PowerBI barely had geospatial support a couple years ago. I think PowerBI Shapemaps exclusively used TopoJSON?

All of this is to say, DuckDB geospatial is very cool and helpful.

show 1 reply
wodenokoto05/04/2025

Why do you use haver-sine over geodesic or reprojection?

I’ve been doing the reprojection thing, projecting coordinates to a “local” CRS, for previous projects mainly because that’s what geopandas recommend and is built around, but I am reaching a stage where I’d like to calculate distance for objects all over the globe, and I’m genuinely interested to learn what’s a good choice here.

show 4 replies
everybodyknows05/04/2025

Looking just at the Hilbert reference, I'm wondering why there is no function to return, for a given level of precision, the set of segments along the curve containing corresponding to a sub-rectangle of the space. Is this functionality packaged up elsewhere?