logoalt Hacker News

drob518today at 3:34 PM1 replyview on HN

I’m confused by the benchmark detail. It says that the “on disk” size for pgit is always larger than the git aggressive size, but then it breaks out just the pgit data size and says that’s typically smaller. If you’re using PG to implement this, don’t you have to account for the PG storage, too, in your comparison? My takeaway is that pgit always has a larger storage requirement than git aggressive compression. Or am I reading that wrong? Obviously, pgit also brings features like SQL querying that git doesn’t have that you might prioritize more highly. But the author seems to be pushing the storage benefit highly.


Replies

ImGajeed76today at 3:41 PM

good question! the "pgit actual" column tries to compare just the compression algorithms, similar to how the git side only counts the .pack file and not .idx/.rev/.bitmap or filesystem overhead. so both sides strip their "container" overhead to make it a fair comparison. but you're totally right that in practice the on-disk size is what you actually pay. that's why both numbers are in the table. and yes, pgit on-disk is usually larger than git aggressive. the tradeoff is that you get SQL queryability over your entire history, which git just can't do natively.