logoalt Hacker News

theszyesterday at 5:57 PM0 repliesview on HN

Arithmetic coding of a single bit preserves ordering of encoded bits, if CDF(1) > CDF(0). If byte's encoding process is going from higher bits to lower bits, arithmetic coding (even with dynamic model) will preserve ordering of individual bytes.

In the end, arithmetic coding preserves ordering of encoded strings. Thus, comparison operations can be performed on the compressed representation of strings (and big-endian representations of integers and even floating point values), without the need to decompress data until that decompressed strings are needed.

Another view: strings are compared by memcmp as if they are mantissas with the base 256. "hi!" is 'h'(1/256)+'i'(1/256)^2+'!'(1/256)^3+0(1/256)^4 and then there are zeroes to the infinity. Arithmetic encoding represents encoded strings as mantissas where base is 2. Range coding can utilize other bases such as 256.