if it makes thing flaky
then it actually is a huge success
because it found a bug you overlooked in both impl. and tests
at least iff we speak about unit tests
I remember having a flaky test with random number generation a few years ago - it failed very rarely (like once every few weeks) and when I finally got to fixing it, it was an actual issue (an off by one error).
This will often break on stuff like daylight saving changes, while almost as often you don't give a rats ass about the boundary behaviour.
Burma-shave
Only if it becomes obvious why it is flaky. If it's just sometimes broken but really hard to reproduce then it just gets piled on to the background level of flakiness and never gets fixed.