Today's state of the art compilers can't even do vectorized integer division by a compile time known constant very well. They definitely can't map high-level constructs onto low-level patterns, and they don't carry anywhere near enough semantic information through the different optimization passes to be able to take even very safe, simple, sane shortcuts with zero possibility of UB or other issues. There's a lot of performance being left on the table.
Mind you, somebody who's sympathetic to the machine's needs can easily scrape most of that performance back by writing C/C++/Zig in a way that easily maps to the optimal assembly. The optimizer won't make your code drastically worse too often, so if you start with something nice then actually dropping down into assembly has limited use cases and usually limited benefits...if you know what you're doing and throw out every style guide as you do so.
As to this server in particular? At first blush it looks more like a learning exercise. You'll go a lot further with clever incremental routines and appropriately leveraging your OS's async API than you will by shaving a few instructions here and there.
As to servers in general? Your kernel is the real bottleneck. If you need all of its features then you don't have a lot of options, but if you're like most applications then you're leaving a ton of performance on the floor not going for kernel bypass (not that using your kernel for network is a _bad_ decision, but you are nevertheless incurring a 10x-50x performance hit as the cost). Assembly shenanigans literally don't matter in comparison.