logoalt Hacker News

purplesyringayesterday at 6:20 AM1 replyview on HN

The paper doesn't require a bitshift after multiplication -- it directly uses the high half of the product as the quotient, so it saves at least one tick over the solution you mentioned. And on x86, saturating addition can't be done in a tick and 32->64 zero-extension is implicit, so the distinction is even wider.


Replies

aleph_minus_oneyesterday at 9:36 AM

> And on x86, saturating addition can't be done in a tick

Perhaps I misunderstand your point, but I am rather sure that in SSE.../AVX... there do exist instructions for saturating addition:

* (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW

* (V)PHADDSW, (V)PHSUBSW

show 3 replies