logoalt Hacker News

jeffbeeyesterday at 1:25 AM3 repliesview on HN

Yeah, just an example of a QoL issue with DuckDB: even though it can glob files in other cases, the way it passes parameters to GDAL means that globs are taken literally instead of expanded. So I can't query a directory with thirty million geojson files. This is not a problem in geopandas because ipython, being a full interactive development environment, allows me to produce the glob any way I choose.

I think this is a fundamental problem with the SQL pattern. You can try to make things just work, but when they fall then what?


Replies

maxxenyesterday at 1:35 AM

I think this is just cause it hasn't been implemented in spatial yet. DuckDB is currently going through a pretty big refactor of the way we glob/scan/union multiple files with all the recent focus on data lake formats, but my plan is to get to it in spatial after next release when that part of the code has stabilized a bit.

broneryesterday at 9:26 PM

You can use DuckDB in ipython to solve the globbing issue. Then you don't have to worry about OOMs with geopandas.

ffsm8yesterday at 5:47 AM

> fundamental problem with the SQL pattern.

SQL is a DSL and yes, all Domain Specific Languages will only enable what the engine parsing the DSL supports.

But all SQL database I'm aware of let you write custom extensions, which are exactly that: they extend the base functionality of the database with new paradigms. I.e postgis enabling geospatial in postgres or the extensions that enable fuzzy-matching/searching.

And as SQL is pretty much a turing-complete DSL, there is very little you can't do with it, even if the syntax might not agree with everyone