logoalt Hacker News

int_19hyesterday at 3:49 AM2 repliesview on HN

Article has the numbers for their input:

uncompressed: 327005

(gzip) zopfli --i100: 75882

zstd -22 --long --ultra: 69018

xz -9: 67940

brotli -Z: 67859

lzip -9: 67651

bzip2 -9: 63727

bzip3: 61067


Replies

lucb1eyesterday at 3:59 AM

This matches my experience compressing structured text btw. Bzip2 will beat every other tool out there, both on compression ratio and, sadly, decompression time

OP says decompression time is so high because it has similar properties to a memory-hard password hash: it's bandwidth-bound due to the random access requirement. Even xz decompresses 2.5x faster, and I don't find it particularly fast

This is why I switched away, also for text compression; searching for anything that isn't near the beginning of a large file is tedious. My use-case for compression is generally not like OP's, that is, compressing 100KB so that it can fit into Minecraft (if I understood their purpose correctly); I compress files because they take too much disk space (gigabytes). But if I never wanted to access them, I'd not store them, so decompression speed matters. So I kinda do agree with GP that Bzip2 has limited purposes when Zstd is just a few % more bytes to store for over an order of magnitude more speed (1GB/s instead of 45MB/s)

Edit: And all that ignores non-json/xml/code/text compression tasks, where Bzip2/LZMA doesn't give you the best compression ratio. I'd argue it is premature optimization to use Bzip2 without a very specific use-case like OP has for very good code compression ratios and a simple decoder

show 1 reply
ac29yesterday at 4:15 AM

I tested bzip3 vs xz compressing enwik8: bzip3 was 7x faster and had ~20% better compression

show 1 reply