logoalt Hacker News

Fast-Servers

94 pointsby toshtoday at 2:11 PM28 commentsview on HN

Comments

luizfelbertitoday at 3:40 PM

A bit dated in the sense that for Linux you'd probably use io_uring nowadays, but otherwise it's a timeless design

Still, I'm conflicted on whether separating stages per thread (accept on one thread and the client loop in another) is a good idea. It sounds like the gains would be minimal or non-existent even in ideal circumstances, and on some workloads where there's not a lot of clients or connection churn it would waste an entire core for handling a low-volume event.

I'm open to contrarian opinions on this though, maybe I'm not seeing soemthing...

show 3 replies
kogustoday at 3:14 PM

Slightly tangential, but why is the first diagram duplicated at .1 opacity?

show 1 reply
ratrockettoday at 3:17 PM

discussed in 2016: https://news.ycombinator.com/item?id=10872209 (53 comments)

bee_ridertoday at 3:38 PM

> One thread per core, pinned (affinity) to separate CPUs, each with their own epoll/kqueue fd

> Each major state transition (accept, reader) is handled by a separate thread, and transitioning one client from one state to another involves passing the file descriptor to the epoll/kqueue fd of the other thread.

So this seems like a little pipeline that all of the requests go through, right? For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement? And does it create a load-balancing problem? I’d expect some stages to be quite cheap…

show 1 reply
password4321today at 5:27 PM

Always interesting to review the latest techempower web framework benchmarks, though it's been a year:

https://www.techempower.com/benchmarks/#section=data-r23&tes...

rot13maxitoday at 4:39 PM

i havent seen an sdf1.org url in a looooong time. lovely to see its still around

fao_today at 4:08 PM

this is more or less, in some way, what Erlang does and how Erlang is so easy to scale.

epicprogrammertoday at 4:11 PM

It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.

show 3 replies