I think you are right in the interpretation. In my case though dynamic dispatch was already integrated with bytecode lowering so the error wasn't helpful :(
Ah, so you want the function containing the interpreter loop to be compiled for the baseline architecture but some of the bytecode implementations inside it to use more advanced intrinsics? Yeah, I don’t think GCC has a good answer to that one. It also sounds gnarly from a general calling-convention perspective—how is the VZEROUPPER on exit supposed to be emitted if you can’t count on AVX?..
We recently (finally) got __attribute__((musttail)) in GCC[1], I’ve just tried it between functions with mismatched __attribute__((target))s and it does work, so theoretically you could code your interpreter that way. But it seems like you’re still bound to keep loading and storing vector state from and to memory and VZEROUPPERing your registers after each bytecode, and that doesn’t sound like a particularly good time.
Ah, so you want the function containing the interpreter loop to be compiled for the baseline architecture but some of the bytecode implementations inside it to use more advanced intrinsics? Yeah, I don’t think GCC has a good answer to that one. It also sounds gnarly from a general calling-convention perspective—how is the VZEROUPPER on exit supposed to be emitted if you can’t count on AVX?..
We recently (finally) got __attribute__((musttail)) in GCC[1], I’ve just tried it between functions with mismatched __attribute__((target))s and it does work, so theoretically you could code your interpreter that way. But it seems like you’re still bound to keep loading and storing vector state from and to memory and VZEROUPPERing your registers after each bytecode, and that doesn’t sound like a particularly good time.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119616