Yes, my monitoring system not alerting me when the systems it runs on are failing is the entire problem.
That's not a general "software breaks when disks fail" situation: that's a monitoring system failing at its one job.
Your monitoring system failing silently when your infrastructure is under stress is precisely the failure mode that monitoring exists to prevent.
Zabbix solves this with native HA and self-checks. Prometheus makes it your problem to solve with external tooling, and most people don't, until they need it.
Why wouldn't your monitoring system alert you when metrics suddenly disappear? Sounds like you need a better monitoring system, prometheus is not gonna magically solve that problem for you. No wonder you were having issues with prometheus...