My normal issue with floating-point epsilon shenanigans is that they don't usually pass the sniff test, suggesting something fundamentally wrong with the problem framing or its solution.
It's a classic, so let's take vector normalization as an example. Topologically, you're ripping a hole in the space, and that's causing your issues. It manifests as NaN for length-zero vectors, weird precision issues too close to zero, etc, but no matter what you employ to try to fix it you're never going to have a good time squishing N-D space onto the surface of an N-D sphere if you need it to be continuous.
Some common subroutines where I see this:
1. You want to know the average direction of a bunch of objects and thus have to normalize each vector contributing to that average. Solution 1: That's not what you want almost ever. In any of the sciences, or anything loosely approximating the real world, you want to average the un-normalized vectors 99.999% of the time. Solution 2: Maybe you really do need directions for some reason (e.g., tracking where birds are looking in a game). Then don't rely on vectors for your in-band signaling. Explicitly track direction and magnitude separately and observe the magic of never having direction-related precision errors.
2. You're doing some sort of lighting normalization and need to compute something involving areas of potentially near-degenerate triangles, dividing by those values to weight contributions appropriately. Solution: Same as above, this is kind of like an average of averages problem. It can make fuzzy, intuitive sense, but you'll get better results if you do your summing and averaging in an un-normalized space. If you really do need surface normals, store those explicitly and separate from magnitude.
3. You're doing some sort of ML voodoo to try to get better empirical results via some vague appeal to vanishing gradients or whatever. Solution: The core property you want is a somewhat strange constraint on your layer's Jacobian matrix, and outside of like two papers nobody is willing to put up with the code complexity or runtime costs, even when they recognize it as the right thing to do. Everything you're doing is a hack anyway, so make your normalization term x/(|x|+eps) with eps > 0 rather than equal to zero like normal. Choose eps much smaller than most of the vectors you're normalizing this way and much bigger than zero. Something like 1e-3, 1e-20, and 1e-150 should be fine for f16, f32, and f64. You don't have to tune because it's a pretty weak constraint on the model, and it's able to learn around it.