Some more links that are inside the article:
- More info about skiplists: https://arxiv.org/pdf/2403.04582
- Performance comparison with B-tree ?: https://db.cs.cmu.edu/papers/2018/mod342-wangA.pdf
- Other blog from Anthithesis about writing their own db: https://antithesis.com/blog/2025/testing_pangolin/
Also I find it a bit hard to understand the performance outcome of this setup.
I know formats like parquet and databases like ClickHouse work better when duplicating data instead of doing joins. I guess BigQuery is similar.
The article is great but would be also interesting to learn how performance actually worked out with this.
Back in 2014, I did an analysis of (single threaded) CPU-efficiency and RAM-efficiency of various data-structures (skiplists, slablists, avl-trees, rb-trees, b-trees):
https://nickziv.wordpress.com/wp-content/uploads/2014/02/vis...
I used whatever I could find on the internet at the time, so the comparison compares both algorithm and implementation (they were all written in C, but even slight changes to the C code can change performance -- uuavl performs much better than all other avl variants, for example). I suspect that a differently-programmed skip-list would not have performed quite so poorly.
The general conclusion from all this, is that any data-structure that can organize itself _around_ page-sizes and cache-sizes, will perform very well compared to structures that cannot.