I'm not even sure a 32 wide array would be good either since on AMD warps are 64 wide. I wouldn't go fully towards auto vectorization with though.
Warp SIMD-width should be a build-time constant. You'd be using a variable-length vector-like interface that gets compiled down to a specified length as part of building the code.
Warp SIMD-width should be a build-time constant. You'd be using a variable-length vector-like interface that gets compiled down to a specified length as part of building the code.