logoalt Hacker News

z3t4yesterday at 1:50 PM0 repliesview on HN

Had an accidental reboot, and it could not boot. Had redundancy, but the other server had failed silently days prior. Solved it with three way redundancy and extra monitoring. Systems fail in many ways at the same time. If you do not test it, there is a chance it wont work. Controlled failure is preferred over unknowns, like rebooting once in a while just to make sure it works.