logoalt Hacker News

ajfriendyesterday at 8:02 AM1 replyview on HN

I appreciate the reply! So, I might be wrong here, but I think we may be talking about two different layers. I’m also not very familiar with the literature, so I’d be interested if you could point me to relevant work or explain where my understanding is off.

To me, the big selling point of H3 is that once you’re "in the H3 system", many operations don’t need to worry about geometry at all. Everything is discrete. H3 cells are nodes in a tree with prefixes that can be exploited, and geometry or congruency never really enter the picture at this layer.

Where geometry and congruency do come in is when you translate continuous data (points, polygons, and so on) into H3. In that scenario, I can totally see congruency being a useful property for speed, and that H3 is probably slower than systems that are optimized for that conversion step.

However, in most applications I’ve seen, the continuous-to-H3 conversion happens upstream, or at least isn’t the bottleneck. The primary task is usually operating on already "hexagonified" data, such as joins or other set operations on discrete cell IDs.

Am I understanding the bottleneck correctly?


Replies

jandrewrogersyesterday at 5:58 PM

A DGGS is a specialized spatial unit system that, depending on the design, allows you to elide expensive computation for a select set of operations with the tradeoff that other operations become much more expensive.

H3 is optimized for equal-area point aggregates. Congruency does not matter for these aggregates because there is only a single resolution. To your point, in H3 these are implemented as simple scalar counting aggregates -- little computational geometry required. Optimized implementations can generate these aggregates more or less at the speed of memory bandwidth. Ideal for building heat maps!

H3 works reasonably for sharding spatial joins if all of the cell resolutions have the same size and are therefore disjoint. The number of records per cell can be highly variable so this is still suboptimal; adjusting the cell size to get better distribution just moves the suboptimality around. There is also the complexity if polygon data is involved.

The singular importance of congruence as a property is that it enables efficient and uniform sharding of spatial data for distributed indexes, regardless of data distribution or geometry size. The practical benefits follow from efficient and scalable computation over data stored in cells of different size, especially for non-point geometry.

Some DGGS optimized for equal-area point aggregates are congruent, such as HEALPix[0]. However, that congruency comes at high computational cost and unreasonably difficult technical implementation. Not recommended for geospatial use cases.

Congruence has an important challenge that most overlook: geometric relationships on a 2-spheroid can only be approximated on a discrete computer. If you are not careful, quantization to the discrete during computation can effectively create tiny gaps between cells or tiny slivers of overlap. I've seen bugs in the wild from when the rare point lands in one of these non-congruent slivers. Mitigating this can be costly.

This is how we end up with DGGS that embed the 2-spheroid in a synthetic Euclidean 3-space. Quantization issues on the 2-spheroid become trivial in 3-space. People tend to hate two things about these DGGS designs though, neither of which is a technical critique. First, these are not equal area designs like H3; cell size does not indicate anything about the area on the 2-sphere. Since they are efficiently congruent, the resolution can be locally scaled as needed so there are no technical ramifications. It just isn't intuitive like tiling a map or globe. Second, if you do project the cell boundaries onto the 2-sphere and then project that geometry into something like Web Mercator for visualization, it looks like some kind of insane psychedelic hallucination. These cells are designed for analytic processing, not visualization; the data itself is usually WGS84 and can be displayed in exactly the same way you would if you were using PostGIS, the DGGS just doesn't act as a trivial built-in visualization framework.

Taking data stored in a 3-space embedding and converting it to H3-ified data or aggregates on demand is simple, efficient, and highly scalable. I often do things this way even when the data will only ever be visualized in H3 because it scales better.

[0] https://en.wikipedia.org/wiki/HEALPix