I enjoyed both these GopherCon talks:
GopherCon 2018: The Scheduler Saga - Kavya Joshi https://www.youtube.com/watch?v=YHRO5WQGh0k
GopherCon 2017: Understanding Channels - Kavya Joshi https://www.youtube.com/watch?v=KBZlN0izeiY
> a goroutine’s state is surprisingly small. The mcall() assembly function only saves 3 values — the stack pointer, the program counter, and the base pointer — into a tiny gobuf struct. That’s it. Why so few? Because goroutine switches happen at function call boundaries, and at those points the compiler has already spilled any important registers to the stack following normal calling conventions.
Wouldn’t that mean go never uses registers to pass arguments to functions?
If so, that seems in conflict with https://go.dev/src/cmd/compile/abi-internal#function-call-ar..., which says “Because access to registers is generally faster than access to the stack, arguments and results are preferentially passed in registers”
Or does the compiler always Go’s stable ABI, known as ABI0 in functions where it inserts code to potentially context switch, and only uses the (potentially) faster ABI that passes arguments in registers elsewhere?
Go missed a big opportunity to be Rust when we needed Rust more than anything. I have long since moved on from Go and C#/.NET is widely available nowadays and in many respects less held back by some strange political choices when it comes to DevEx (I am of course talking about generics).
Isn't a dedicated worker pool with priority queues enough to get predictable P99 without leaving Go?
If you fix N workers and control dispatch order yourself, the scheduler barely gets involved — no stealing, no surprises.
The inter-goroutine handoff is ~50-100ns anyway.
Isn't the real issue using `go f()` per request rather than something in the language itself?
The unfair scheduling point resonates. I run a lot of concurrent HTTP workloads in Go (scraping, data pipelines) and the scheduler is honestly fine for throughput-oriented work where you don't care about tail latency. But the moment you need consistent response times under load it becomes a real problem. GOMAXPROCS tuning and runtime.LockOSThread help in narrow cases but they're band-aids. The lack of priority or fairness knobs is a deliberate design choice but it does push certain workloads toward other runtimes.
This is an excellent idea as a blog. Kudos!
My biggest issue with go is it’s incredibly unfair scheduler. No matter what load you have, P99 and especially P99.9 latency will be higher than any other language. The way that it steals work guarantees that requests “in the middle” will be served last.
It’s a problem that only go can solve, but that means giving up some of your speed that are currently handled immediately that shouldn’t be. So overall latency will go up and P99 will drop precipitously. Thus, they’ll probably never fix it.
If you have a system that requires predictable latency, go is not the right language for it.