it is not easy for a compiler to vectorize
a pragmatic approach: write in a high level interpreted language that rhymes with modern CPUs, vector extensions, memory bandwidth
e.g. apl [0], bqn [1], k [2], kiwi [3]
- vectors are dense (not boxed)
- optimized internal representation (e.g. bitpacked bool vectors)
- primitives act on vectors + use avx, neon if possible
[0] https://www.dyalog.com
[1] https://mlochbaum.github.io/BQN/
[2] https://kx.com
[3] https://kiwilang.comgreat article by marshall on BQN performance compared to C and how to think about it
https://mlochbaum.github.io/BQN/implementation/versusc.html
related:
- columnar databases: kdb, duckdb, clickhouse
- machine learning frameworks: pytorch, keras, jax, mlx