They say "embedding V8 directly into Go (to avoid the network hop)" was only an "incremental improvement"
I'm very curious why this didn't help more. That was my first thought. Maybe they didn't get the result they wanted immediately so gave up before evaluating this fully?
JS runtimes are fatty, so embedding one instantly adds at least 30-50 Mb of RAM usage. Imagine that you do this for just for a specific function (JSON processing) and your total RAM budget for a whole pod is around 256 Mb.
No doubt, this approach would work reasonably well for machines with plenty of RAM, but I can see why it can be a bottleneck when scaled to N instances. RAM is expensive, and when you multiply those 50 extra megabytes by N, your total costs quickly climb up.