What specifically are you saying can make a difference?
I'm saying that extra overhead from making your lock work across processes should be very tiny. That overhead shouldn't add much more than a microsecond in either latency or CPU usage, compared to an in-process lock.
You were saying "reasonable overhead" makes no difference because something "isn't called much". This is not only ambiguous but also not true because latency is important.
What calls specifically are you talking about between windows and linux? This was started by someone talking about WaitForMultipleObjects.