My bad, I must have confused it with something else. Yes, it uses ptrace; there definitely is some overhead around system calls, but that still should be better than running atop a full-scale CPU emulator. That being said, I haven't benchmarked it myself, just remember it being pretty snappy.
Thanks for your correction!