I recently revisited a language comparison project, a specific benchmark tallying the cycle decompositions in parallel of the 3,715,891,200 signed permutations on 10 letters. I kept a dozen languages as finalists, different philosophies but all choices I could imagine making for my research programming. Rather than "ur" I was looking for best modern realizations of various paradigms. And while I measured performance I also considered ease of AI help, and my willingness to review and think in the code. I worked hard to optimize each language, a form of tourism made possible by AI.
The results surprised me:
F# 100 19.17s ±0.04s
C++ 96 19.92s ±0.13s
Rust 95 20.20s ±0.38s
Kotlin 89 21.51s ±0.04s
Scala 88 21.68s ±0.04s
Kotlin-native 81 23.69s ±0.11s
Scala-native 77 24.72s ±0.03s
Nim 69 27.92s ±0.04s
Julia 63 30.54s ±0.08s
Swift 52 36.86s ±0.03s
Ocaml 47 41.10s ±0.10s
Haskell 40 47.94s ±0.06s
Chez 39 49.46s ±0.04s
Lean 10 198.63s ±1.02s
https://github.com/Syzygies/Compare
Naively this is quite surprising, but the devil is in the details. With the exception of Lean I'd point out they're all fairly close: Chez being 2.5x slower than C++ is not ignorable but it's also quite good for a dynamically-typed JITted language[1]. And I'm not surprised that F# does so well at this particular task. Without looking into it more closely, this seems to be a story about F# on .NET Core having the most mature and painless out-of-the-box parallelism of these languages. I assume this is elapsed time, it would be interesting to see a breakdown of CPU time.
I don't think these results are quite comparable because of slightly differing parallelism strategies; I'd expect the F# implementation of just spinning off threads to be more a little more performant than a Rayon parallel iterator, which presumably has some overhead. But that really just shows how hard it is to do a cross-language comparison; Rust and C++ can certainly be made faster than the F# code by carefully manipulating a ton of low-level OS concurrency primitives. This would arguably also be little misleading. Likewise Chez and Haskell have good C FFI; does that count? It's a tricky and highly qualitative analysis.
[1] FYI, one possible performance improvement with the Chez code is keeping the permutations in fxvectors and replace math operations with the fixnum-specific equivalent - this tells the compiler/interpreter that the data are guaranteed to be machine integers rather than bigints, so they aren't boxed/unboxed. I am not sure without running it myself, but there seems to be avoidable allocations in the Chez implementation. https://cisco.github.io/ChezScheme/csug/objects.html#./objec...