logoalt Hacker News

Demiurgeyesterday at 12:36 AM2 repliesview on HN

Yeah, if you want to work with GeoParquet, and you want to keep your data in that format. I can see how that's easer to use your example. That's not what a lot of geospatial data is in. You might have shapefiles, geopackages, geojsons, who knows? There is a lot of software, from QGIS to ESRI to work with different formats to solve different problems. I don't think GeoParquet, even though it might be the fastest geospatial vector data format right now, is that common, and the article did not claim that either. So, given an average user trying to answer some GIS question, some ETL is pretty much a given, on average. And given that, installing PostGIS and installing DuckDB, both require some ETL, and learning some query and analytics language. DuckDB might be an improvement, but it's certainly not as much of a leap as quote is making it out to be.


Replies

sroerickyesterday at 4:07 PM

From the DuckDB geospatial docs:

SELECT * FROM './path/to/some/shapefile/dataset.shp';

COPY table TO 'some/file/path/filename.geojson' WITH (FORMAT gdal, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

This seems extremely helpful to me.

show 1 reply
jeffbeeyesterday at 1:25 AM

Yeah, just an example of a QoL issue with DuckDB: even though it can glob files in other cases, the way it passes parameters to GDAL means that globs are taken literally instead of expanded. So I can't query a directory with thirty million geojson files. This is not a problem in geopandas because ipython, being a full interactive development environment, allows me to produce the glob any way I choose.

I think this is a fundamental problem with the SQL pattern. You can try to make things just work, but when they fall then what?

show 3 replies