Isn't a dedicated worker pool with priority queues enough to get predictable P99 without leaving Go?
If you fix N workers and control dispatch order yourself, the scheduler barely gets involved — no stealing, no surprises.
The inter-goroutine handoff is ~50-100ns anyway.
Isn't the real issue using `go f()` per request rather than something in the language itself?
No. Eventually the queues get full and go routines pause waiting to place the element onto the queue, landing you right back at unfair scheduling.
https://github.com/php/frankenphp/pull/2016 if you want to see a “correctly behaving” implementation that becomes 100% cpu usage under contention.
My usecase was building an append-only blob store with mandatory encryption, but using a semaphore + direct goroutine calls to limit background write concurrency instead of a channel + dedicated writer goroutines was a net win across a wide variety of write sizes and max concurrent inflight writes. It is interesting that frankenphp + caddy came up with almost the same conclusion despite vastly different work being done.