logoalt Hacker News

idoubtityesterday at 8:26 PM6 repliesview on HN

My experience does not match theirs when compressing text and code:

> bzip might be suboptimal as a general-purpose compression format, but it’s great for text and code. One might even say the b in bzip stands for “best”.

I've just checked again with a 1GB SQL file. `bzip2 -9` shrinks it to 83MB. `zstd -19 --long` to 52MB.

Others have compressed the Linux kernel and found that bzip2's is about 15% larger than zstd's.


Replies

mppmtoday at 9:04 AM

What you are seeing here is probably the effect of window size. BZip has to perform the BWT strictly block-wise and is quite memory-hungry, so `bzip2 -9` uses a window size of 900KB, if I recall correctly. Dictionary-based algorithms are more flexible in this regard, and can gain a substantial advantage on very large and repetitive files. The article kind of forgets to mention this. Not that BZip isn't remarkably efficient for its simplicity, but it's not without limitations.

bigiaintoday at 2:40 AM

I suspect the reason for the difference here may be specific use case and the implications there on the size of the files? The author's use case is Lua files to run in Minecraft, and I strongly suspect their example file at 327KB is very much closer to "typical" for that use case than a 1GB SQL file.

It wouldn't surprise me at all that "more modern" compression techniques work better on larger files. It also wouldn't surprise me too much if there was no such thing as a 1GB file when bzip was originally written, according to Wikipedia bzip2 is almost 30 years old "Initial releases 18 July 1996". And there are mentions of the preceding bzip (without the 2) which must have been even earlier than that. In the mid/late 90s I was flying round the world trips with a dozen or so 380 or 500MB hard drives in my luggage to screw into our colo boxen in Singapore London and San Francisco (because out office only has 56k adsl internet).

show 1 reply
ac29today at 4:31 AM

> Others have compressed the Linux kernel and found that bzip2's is about 15% larger than zstd's

I compressed kernel 6.19.8 with zstd -19 --long and bzip3 (default settings). The latter compressed better and was about 8x faster.

cogman10yesterday at 8:55 PM

bzip is old and slow.

It was long surpassed by lzma and zstd.

But back in roughly the 00s, it was the best standard for compression, because the competition was DEFLATE/gzip.

show 2 replies
evikstoday at 2:07 AM

is SQL file text or code?

8n4vidtmkvmkyesterday at 9:49 PM

And here i got best compression out of xz for SQL.