Generally when I want to run something with so much parallelism I just write a small Go program instead, and let Go's runtime handle the scheduling. It works remarkably well and there's no execve() overhead too
dang and u did all that without a 10 year journey
dang and u did all that without a 10 year journey