logoalt Hacker News

ozgrakkurtyesterday at 11:50 AM1 replyview on HN

Some more links that are inside the article:

- More info about skiplists: https://arxiv.org/pdf/2403.04582

- Performance comparison with B-tree ?: https://db.cs.cmu.edu/papers/2018/mod342-wangA.pdf

- Other blog from Anthithesis about writing their own db: https://antithesis.com/blog/2025/testing_pangolin/

Also I find it a bit hard to understand the performance outcome of this setup.

I know formats like parquet and databases like ClickHouse work better when duplicating data instead of doing joins. I guess BigQuery is similar.

The article is great but would be also interesting to learn how performance actually worked out with this.


Replies

nzyesterday at 4:47 PM

Back in 2014, I did an analysis of (single threaded) CPU-efficiency and RAM-efficiency of various data-structures (skiplists, slablists, avl-trees, rb-trees, b-trees):

https://nickziv.wordpress.com/wp-content/uploads/2014/02/vis...

I used whatever I could find on the internet at the time, so the comparison compares both algorithm and implementation (they were all written in C, but even slight changes to the C code can change performance -- uuavl performs much better than all other avl variants, for example). I suspect that a differently-programmed skip-list would not have performed quite so poorly.

The general conclusion from all this, is that any data-structure that can organize itself _around_ page-sizes and cache-sizes, will perform very well compared to structures that cannot.