Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.
RISC-V will get there, eventually.
I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.
> RISC-V will get there, eventually.
Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.
As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.
There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.
Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.
> AND the software with no architecture-specific optimisations
The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.
We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).
Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.
If you make a spec that the wider industry cannot effectively implement into quality products, it's the spec that's wrong. And that's true for anything - whether it's RISC-V, ipv6, Matter, USB-C and so on.
That's what makes writing specs hard - you need people who understand implementation challenges at the table, not dreaming architects and academics.
> Don't blame the ISA - blame the silicon implementations
That's true, but tautological.
The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.
The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.
The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.
This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.
A pattern I've noticed for a very long time:
A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.
I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.
ARM's history is another example.
IF you care to read the article, they indeed do not blame the architecture but the available silicon implementations.
ARM was never a "speed demon"; it started out as a low power small-area core and clearly had more complexity and thought put into it than MIPS or RISC-V.
Over a decade ago: https://news.ycombinator.com/item?id=8235120
RISC-V will get there, eventually.
Strong doubt. Those of us who were around in the 90s might remember how much hype there was with MIPS.
RISC-V lacks a bunch of really useful relatively easy to implement instructions and most extensions are truly optional so you can't rely on them. That's the problem if you let a bunch of academics turn your ISA into a paper mill.
In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.
In some cases RISC-V ISA spec is definitely the one to blame:
1) https://github.com/llvm/llvm-project/issues/150263
2) https://github.com/llvm/llvm-project/issues/141488
Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.