RISC-V does not have the pitfalls of experimental ISAs from 45 years ago, but it has other pitfalls that have not existed in almost any ISA since the first vacuum-tube computers, like the lack of means for integer overflow detection and the lack of indexed addressing.
Especially the lack of integer overflow detection is a choice of great stupidity, for which there exists no excuse.
Detecting integer overflow in hardware is extremely cheap, its cost is absolutely negligible. On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably, because each arithmetic operation must be replaced by multiple operations.
Because of the unacceptable cost, normal RISC-V programs choose to ignore the risk of overflows, which makes them unreliable.
The highest performance implementations of RISC-V from previous years were forced to introduce custom extensions for indexed addressing, but those used inefficient encodings, because something like indexed addressing must be in the base ISA, not in an extension.
> On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably,
Most languages don't care about integer overflow. Your typical C program will happily wrap around.
If I really want to detect overflow, I can do this:
add t0, a0, a1
blt t0, a0, overflow
Which is one more instruction, which is not great, not terrible.> On the other hand, detecting integer overflow in software is extremely expensive
this just isn't true. both addition and multiplication can check for overflow in <2 instructions.
OK, look.
Since my previous attempt to measure the impact of trap on signed overflow didn't seem to have moved your position one bit, I thought I'd give it a go in the most representable way I could think of:
I build the same version of clang on a x86, aarch64 and RISC-V system using clang. Then I build another version with the `-ftrapv` flag enabled and compared the compiletimes of compiling programs using these clang builds running on real hardware:
As you can see, once again the overhead of -ftrapv is quite low.Suprizinglt the -ftrapv overhead seems the highest on the Cortex-A78. My guess is that this because clang generates a seperate brk with unique immediate for every overflow check, while on RISC-V it always branches to one unimp per function.
Please tell me if you have a better suggestion for measuring the real world impact.
Or heck, give me some artificial worst case code. That would also be an interesting data point.
Notes:
* The format is mean±variance
* Spacemit X100 is a Cortex-A76 like OoO RISC-V core and A100 an in-order RISC-V core.
* I tried to clock all of the cores to the same frequency of about 2.2GHz. *Except for the A55, which ran at 1.8GHz, but I linearly scaled the results.
* Program A was the chibicc (8K loc) compiler and program B microjs (30K loc).