logoalt Hacker News

ryandrakeyesterday at 8:29 PM2 repliesview on HN

> My experience is that (most) spinners give off reliable pre-failure indicators (if you take the time to look/script looking), but SSDs fail by disappearing from the bus. The SSDs do fail much less often, but they still fail from time to time and recovery is harder.

I'm not a pro, just a smalltime dork with a homelab. I use cheap WD HDDs on my NAS system connected to an LSI hardware RAID controller. I'll boast that I have a 100% record so far of preventing downtime and data loss by simply listening for the controller's audible alarm and swapping drives right away (I keep brand new spares). I also have offline backups, but have so far never needed them. Not sure how this would change if I moved to SSDs.


Replies

adastra22today at 11:36 AM

I have a homelab with 24 disks (2x 12-drive raidz3 pools). In the past 15 years of operation I have replaced many drives. Usually my experience matches yours, but there is one time I got a simultaneous 3-disk failure. One disk failed, then when I replaced it and did a scrub, two more from the same pool failed in quick succession. I had to scramble to get more spare drives, and didn't sleep much while I waited for the pools to finish the scrub.

Coordinated failure is a thing, unfortunately. Drives bought together tend to fail together.

Felgeryesterday at 8:49 PM

Well, SAS disk tend to go in failed state immediately or very quickly, most of the time without going first through the warning state.

SATA disk are indeed generally more predictable failure-wise. Most issues are related to a failing head stack assembly. Rarely platter demagnetization for some disks (Toshiba laptop).

Other failure issues are usually related to a friggin' manufactured firmware issue from Dell, HP or Lenovo corp.