So if you need speed, you just have to swallow your OO programmer's pride and put your data in arrays.
And avoid moving said data between physical threads as much as possible.
Most of the bottlenecks I see are not due to the organization of data. Unnecessary communication of data is the #1 offender.
Maybe someone can write an OO language where arrays of structs are automatically stored as structs of arrays.
mild /s
If you have hot loops with millions of iterations at a time, structure your code accordingly. Its not anti-OO to choose the right data structure for the job.