Is this a technical impossibility or just it hasn't been done yet? Could a library support generating intrinsics for a large set of architectures?
Google Highway gets mentioned in the article.
There is google’s highway, that provides an abstraction layer. It is used by NumPy.
The full scope of what SIMD is used for is much larger than parallelizing evaluation of numeric types and algorithms.
For example, it is used for parallel evaluation of complex constraints on unrelated types simultaneously while packed into a single vector. Think a WHERE clause on an arbitrary SQL schema evaluated in full parallel in a handful of clock cycles. SIMD turns out to be brilliant for this but it looks nothing like auto-vectorization.
None of the SIMD libraries like Google Highway cover this case.