This is an oft-overlooked point. An obvious place to look for improving fork+execve is to see whether posix_spawn can be given more efficient kernel mechanisms to be based upon.
And of course that has already been done. On NetBSD, posix_spawn() is a fully-fledged system call and much of the work is done in kernel mode.
* https://blog.netbsd.org/tnf/entry/posix_spawn_syscall_added
This is literally discussed in the article this post links to.