If you need callbacks and generics, you're not writing performance code.
99% of code in the wild is comically inefficient and is doing the wrong thing, using way too generic data structures and algorithms for very concrete problems. C++ templates may be one way to make comically slow code faster by spending a lot of compile time. But it's often much quicker to just write straightforward concrete code that the compiler can easily optimize.
IMO C++ makes for slow programs for the sole fact that it compiles so slow (if you use its modern features), so you have much less time to actually iterate and improve.
If compilation is even more than 10% of the time it takes you to run your tests, you're probably not writing correct code. Compilation times don't even measure.