I'd be interested in seeing an explanation of the code, since it looks pretty incomprehensible to me. Per the arbitrary rules I set for myself, I'm not allowed to precompute/hardcode the wheel (looks like this implementation uses a hardcoded wheel of size 2x3x5=30). I wonder if/by how much the performance would suffer by computing and storing the coprime remainders in memory instead of handing them directly to the compiler.
I'd be interested in seeing an explanation of the code, since it looks pretty incomprehensible to me. Per the arbitrary rules I set for myself, I'm not allowed to precompute/hardcode the wheel (looks like this implementation uses a hardcoded wheel of size 2x3x5=30). I wonder if/by how much the performance would suffer by computing and storing the coprime remainders in memory instead of handing them directly to the compiler.