logoalt Hacker News

mppmtoday at 9:04 AM0 repliesview on HN

What you are seeing here is probably the effect of window size. BZip has to perform the BWT strictly block-wise and is quite memory-hungry, so `bzip2 -9` uses a window size of 900KB, if I recall correctly. Dictionary-based algorithms are more flexible in this regard, and can gain a substantial advantage on very large and repetitive files. The article kind of forgets to mention this. Not that BZip isn't remarkably efficient for its simplicity, but it's not without limitations.