The correct choice IMHO is type-erasure. It does not necessarily have overhead, because optimizers can specialize or devirtualize. Of course, this my depend on how you implement your language, but in C this works nicely. The problem with monomorphization is that it leads to exponential expansion, which later passes of the compiler can not unify again (at least this is much harder than not expanding in the first place). It should also fundamentally limit what you can do, because expansion has to stop at some point, but I haven't thought about this too much.
I also think that where you want monomophization, macro seem fine. I do not think this necessarily has to be clunky, but this is just a guess.
I don't think the numbers bear out that "this works nicely" in C, it seems like you have worse perf numbers for some common cases like sorting ?
Type-erasure does have an inherent overhead. Sure, optimizations can be made, but they can be fickle and specialization is basically implicit monomorphization.
Using C macros to replicate Rust's monomorphism has several drawbacks: they are inherently unhygienic, even in comparison to Rust's own; you can't set type-bounds; they aren't even a part of C proper, etc.
I prefer Rust's approach with the choice between generics, macros, dyn and Any.