While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert SLEB128-encoded type/addend to use ULEB128 instead, the generate code is inferior to or on par with SLEB128 for one-byte encodings on x86, AArch64, and RISC-V. Haven't tried wider values - but zigzag encoding is likely slower as well
// One-byte case for SLEB128 int64_t from_signext(uint64_t v) { return v < 64 ? v - 128 : v; }
// One-byte case for ULEB128 with zig-zag encoding int64_t from_zigzag(uint64_t z) { return (z >> 1) ^ -(z & 1); }
Zigzag encodings are a common compression scheme used in the Parquet format. It is fun to speculate that these kind of tricks could be applied there in something so commonly under the hood of a lot of data processing and analytics
Is the matrix for bit shifting upside down or am I momentarily making a really dumb mistake here? Edit: nvm I missed the footnote which clarifies that for whatever reason the instruction populates the matrix from bottom to top.
This sort of analysis is great.
Now why can't compilers do this sort of thing automatically?
Almost any problem seems to be possible to speed up 1000x in AVX512+days of thought compared to the naive version written in a python loop. If we could automate that whole process for big codebases the performance gains could be huge.
Worth mentioning that MeshOptimizer (https://github.com/zeux/meshoptimizer) has become one of a handful 'hidden champion' pillar libraries that probably carries half of the gaming industry.
Basically the curl of asset pipelines ;)
https://github.com/zeux/meshoptimizer/discussions/986