I'm somewhat dubious about anything talking about low level performance programming at the instruction level that doesn't distinguish between latency and throughput, never mind mention the incredibly out-of-order nature of modern desktop/server class CPU cores.