I've wondered for a long time if we would have been able to make do without protected mode (or hardware protection in general) if user code was verified/compiled at load, e.g. the way the JVM or .NET do it...Could the shift on transistor budget have been used to offset any performance losses?
I think the interesting thing about having protection in software is you can do things differently, and possibly better. Computers of yesteryear had protection at the individual object level (eg https://en.wikipedia.org/wiki/Burroughs_Large_Systems). This was too expensive to do in 1970s hardware and so performance sucked. Maybe it could be done in software better with more modern optimizing compilers and perhaps a few bits of hardware acceleration here and there? There's definitely an interesting research project to be done.
I looked into that, concluded the spoiler is Specter.
Basically, you have to have out of order/speculative execution if you ultimately want the best performance on general/integer workloads. And once you have that, timing information is going to leak from one process into another, and that timing information can be used to infer the contents of memory. As far as I can see, there is no way to block this in software. No substitute for the CPU knowing 'that page should not be accessible to this process, activate timing leak mitigation'.
Microsoft Research had an experimental OS project at one point that does just that with everything running in ring 0 in the same address space:
https://en.wikipedia.org/wiki/Singularity_(operating_system)
Managed code, the properties of their C# derived programming language, static analysis and verification were used rather than hardware exception handling.