Seems similar to the SEDA architecture https://en.wikipedia.org/wiki/Staged_event-driven_architectu...
Slightly tangential, but why is the first diagram duplicated at .1 opacity?
discussed in 2016: https://news.ycombinator.com/item?id=10872209 (53 comments)
> One thread per core, pinned (affinity) to separate CPUs, each with their own epoll/kqueue fd
> Each major state transition (accept, reader) is handled by a separate thread, and transitioning one client from one state to another involves passing the file descriptor to the epoll/kqueue fd of the other thread.
So this seems like a little pipeline that all of the requests go through, right? For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement? And does it create a load-balancing problem? I’d expect some stages to be quite cheap…
Always interesting to review the latest techempower web framework benchmarks, though it's been a year:
https://www.techempower.com/benchmarks/#section=data-r23&tes...
i havent seen an sdf1.org url in a looooong time. lovely to see its still around
this is more or less, in some way, what Erlang does and how Erlang is so easy to scale.
It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.
A bit dated in the sense that for Linux you'd probably use io_uring nowadays, but otherwise it's a timeless design
Still, I'm conflicted on whether separating stages per thread (accept on one thread and the client loop in another) is a good idea. It sounds like the gains would be minimal or non-existent even in ideal circumstances, and on some workloads where there's not a lot of clients or connection churn it would waste an entire core for handling a low-volume event.
I'm open to contrarian opinions on this though, maybe I'm not seeing soemthing...