> you seem to be narrowing your argument to "well for simple math it's less efficient" but that's not the argument being made at all.
What? Unless the thing you want to compute happens to be exactly that eml() function (no multiplication, no addition, no subtraction unless it's an exponential minus a log, etc.) or almost so, it is unquestionably less efficient. If you believe otherwise, then please provide the eml() implementation of a practically useful function of your choice (e.g. that Arrhenius rate). Then we can count the superfluous transcendental function evaluations vs. a conventional implementation, and try to understand what benefit could outweigh them.
> A 10-stage EML pipeline would be about the size of an avx-512 instruction block on a modern CPU
Can you explain where you got that conclusion? And what do you think a "10-stage EML pipeline" would be useful for? Remember that the multiply embedded in your Arrhenius rate is already 8 layers and 12 operations.
Also, can you confirm whether you're working with an LLM here? You're making a lot of unsupported and oddly specific claims that don't make sense to me, and I'm trying to understand where they're coming from.