logoalt Hacker News

rurbanyesterday at 3:11 PM1 replyview on HN

This misses the very best compressors by Fabrice Bellard. https://bellard.org/nncp/ and for text tm_zip


Replies

lucb1etoday at 1:35 AM

Interesting approach. The fastest of the 4 presented compressors ("LSTM (small)") is 24 times slower than xz, and their best compressor ("LSTM (large1)") is 429 times slower than xz. Let alone gzip or, presumably, zstandard (not shown in paper). They also ran the models on different CPUs (a Core i5 and a Xeon E5) so the results are not even comparable within the same paper. A linked webpage lists the author's decompression times, which are even worse: xz decompresses twelve thousand times faster (50MB/s vs. 4kB/s) when nncp has an Nvidia RTX 3090 and 24GB RAM available to it, which apparently speeds it up by 3x compared to the original CPU implementation.

At half the size of xz's output, there can be applications for this, but you need to:

- not care about compression time

- not be constrained on hardware requirements (7.6GB RAM, ideally let it run on a GPU)

- not care about decompression time either

- and the data must be text (I can't find benchmarks other than from English Wikipedia text, but various sources emphasize it's a text compressor so presumably this is no good on e.g. a spacecraft needing to transmit sensor/research data over a weak connection, even if the power budget trade-off of running a GPU instead of pumping power into the antenna were the optimal thing to do)