And avoid moving said data between physical threads as much as possible.
Most of the bottlenecks I see are not due to the organization of data. Unnecessary communication of data is the #1 offender.
Working set and algorithm diagonalization (work independence) FTW. Immutable data structures and copying often helps to avoid cache invalidation penalties.
Working set and algorithm diagonalization (work independence) FTW. Immutable data structures and copying often helps to avoid cache invalidation penalties.