I don't think those benches are much of a flex, even by the author's own description you'd be fine with any of them. They all have acceptable performance and don't show any order of magnitude differences or non-linear scaling problems.
Further, the benches that are showing best there are non-thread-stealing scenarios, not tokio.
I also suspect simply tuning the thread-based workloads more aggressively would have the same effect.
When I profile high throughput tokio applications there's way too much contention on shared atomics, mostly inside tokio's scheduler itself. On lower core count machines and where the workload is I/O heavy, this is probably fine. So, yes, web servers.
But I'm very interested in applications that scale on machines with lots of cores and where CPU is a large part of the equation.
You assumed
> Likely more efficient than half the async runtimes out there.
The benchmark shows the opposite: 2 (multithreaded async runtime) vs 7 (threads) * 10^8 ns per request for 2k requests/s.
> non-linear scaling problems.
oh, look closely, the relative gap increases with #requests/s