Every column has its own independent constraint: equality, order, range intersection, bit sets, etc that is evaluated concurrently in single operations. Independent per column in parallel. It does require handling the representation of columns to enable it but that isn’t onerous in practice.
It isn’t intuitive but it is one of those things that is obvious in hindsight once you see how it works. The gap is that people struggle to understand how to make this something SIMD native, especially in high-performance systems.
Ah, so you're just doing SoA or AoSoA layout? It sounded like you where doing something more special than the standard SIMD usecase.
This does easily work with SIMD abstractions and even length-agnostic vector ISAs, unless you're doing AoSoA and your storage format has to match your memory format and it has the be the same on all machines. In which case you probably want to do something like 4K blocks anyways, in which case you can make it agnostic for all vector length anybody reasonably cares about for this type of application anyways.