logoalt Hacker News

Notes from Optimizing CPU-Bound Go Hot Paths

17 pointsby nnxlast Tuesday at 4:46 AM3 commentsview on HN

Comments

nasretdinovtoday at 9:20 AM

I think all the points that the author complains about are valid. I believe some libraries like Arrow chose a completely different approach to supporting Go altogether: https://github.com/apache/arrow-go#performance . They effectively compile C code to Go assembly and thus avoid paying the cost of cgo, and get the benefits of having a lot more control over performance and data structures layout that C gives. I am obviously not suggesting that the author should do the same, but it's just a confirmation that achieving the last 50% of the performance is indeed sometimes an impossibly hard task in Go

sylwaretoday at 11:10 AM

I am coding assembly using the static code prediction rule (on RISC-V): if not registered in the predictor, conditional forward branch is predicted not taken and contitional backward branch is predicted taken.

Mechanically, it pushes down forward the 'unlikely' code, and with semantic knowledge of how 'hot' a code is, you can, very easily hierarchically refactor (assembly) code to really favor the 'likely' code, that in a intensity spectrum.

Basically speaking, mostly all 'expected' code will be nicely packed and predicted, and you can do that at various scale (yes, it works for huge code paths).

It is beautiful :)

I wonder if x86_64 hardware follows that rule (I think I read it for intel, but not AMD).

coldstartopstoday at 7:25 AM

Nice! Now the question is: How many classic Reynolds boids can you run on 1 CPU at 60FPS, without using any go routines?

I managed to get around 8192 using Serge Skoredin's approach from the blog post last year: https://skoredin.pro/blog/golang/cpu-cache-friendly-go

Also tried some of the techniques in this blog post, and managed to squeeze a bit more with the insights from your post.